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A compact tree repre- 
sentation is used during the 
electronic storage, transmission, 
and presentation of a structured 
hypermedia document (25) in 
a networked computer. All text 
portions of the documents are 
pre-processed by a document 
parser (30) and the resulting 
document structure is stored in 
compact and compressed form 
in a persistent object storage 
(31) while the document 
content (32) is available in 
a compressed and indexable 
form consistent with full text 
retrieval systems. * During 
document delivery, the compact 
representation of the document 
(33, 39) is retrieved from the 
persistent object storage and 
transferred to a client computer 
(22), which reconstructs the 
document and presents it to 
a user (36). Any arbitrary 
structured document type can 
be stored and delivered this 
way. The CT representation 
can be partitioned dynamically 

CT b LTJS«i n r»l 1 °ii^ U ^ Client 10 Ti?' Pam °l * e . documcnt ^rementally. for non-linear or temporally ordered access. The 
rcpresentatwn can be used for group collaboration applications, including document sharing and document authoring applications. 
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COMPACT TREE FOR STORAGE AND RETRIEVAL 
OF STRUCTURED HYPERMEDIA DOCUMENTS 

BACKGROUND OF THE INVENTION 

1 • Field of the Invention 

This invention relates to electronic data storage, and more particularly to storage, 
retrieval, transmission, and presentation of structured hypermedia documents in 
distributed computing systems. 

2. Description of Related Art 

Structured documents are a class of electronic information storage in which the text 
content of a document includes embedded character sequences known as "markup" which 
identify structural elements and attributes or formatting codes for the content. The 
Standard Generalized Markup Language (SGML) is an example of a syntax for storing 
and processing structured documents (International Organization for Standardization, ISO 
International Standard 8879 - Standard Generalized Markup Language, 1985 Geneva, 
Switzerland). The HyTime standard is an example of a markup language for structured 
hypermedia documents (International Organization for Standardization, ISO International 
Standard 10744 - Hypermedia Time-Based Structuring Language (HyTime), 1992. 
Geneva, Switzerland). The Hypertext Markup Language (HTML), which is defined as an 
SGML document type definition (DTD), is a widely used markup language for hypertext 
documents (Bemers-Lee, T., and Connolly, D., Hypertext Markup Language - 2.0, 
Internet Engineering Task Force RFC 1866, November 1995). 

Under the SGML standard, a DTD defines the structural components that are required and 
allowed in a particular type of document. Each DTD begins with a declaration of the 
document type, i.e., a statement that assigns an identifier to the particular document type 
(e.g., a DTD defining a magazine may begin with the document type declaration 
"magazine"). All documents of the declared type should be structured according to the 
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DTD. The DTD then defines the elements, attributes, entities, and notations that may be 
used to compose a document of the given type. Elements are the components that create 
the logical structure of the document (e.g., a magazine's elements are articles, which may 
consist of text, pictures, and graphical figures or tables). Attributes are the characteristics 
that each element type may take on in a document of the given type (e.g., one attribute 
of an article is the page number on which it begins). Entities may be used to refer to long 
strings of text or to external files (e.g., the term "Johnson article" may refer to an article 
in another magazine). Notations identify non-SGML components and provide instructions 
for using these components when presenting the document. Additional information on 
documents structured according to the SGML syntax may be found in The SGML Primer, 
SoftQuad, Inc., 1995. 

One conventional approach by which structured documents in electronic form are 
delivered involves a client-server division. A client application executing on a local client 
processor retrieves a document in its native encoding from a server software program 
executing on a remote server processor (transfer). The client application then parses the 
document locally according to the grammar of the document structure (parsing), 
combines the output of the parser with local requirements regarding style attributes of 
individual elements (rendering), and presents the document and its content to the user 
interface (display). A retrieved document is transferred in bulk to the client. 

Bandwidth is a scarce resource in present client-server systems, so it would be desirable 
to minimize the amount of data required to be transferred from a server to a client. 
Accordingly, a second approach by which structured documents are delivered is for the 
document server to transmit to the client a compressed form of the document's native 
encoding (compression), using conventional compression techniques such as those 
described in Witten, I. H., Moffat, A., and Bell, T., Managing Gigabytes - Compressing 
and Indexing Documents and Images, NY: Van Nostrand Reinhold 1994. The client 
retrieves the compressed document (transfer), decompresses the document to obtain the 
native encoding (decompression), and then parses the document locally according to the 
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grammar of the document structure (parsing), combines the output of the parser with local 
requirements regarding style (rendering), and presents the document and its content to the 
user interface (display). However, a retrieved compressed document still is transferred 
in bulk to the client 

In these conventional approaches, a document authoring tool generates documents in a 
native encoding. Thereafter, the documents are transferred to the server and retained in 
the native encoding. If a document has been structured for time-dependent presentation 
(that is, some elements of the document are to be displayed before other elements), the 
internal scheduling information of the document is not used to schedule transmission of 
document elements from server to client. Instead, the document is transferred m bulk to 
the client for parsing, rendering, and time-dependent display at the client. Further, in the 
conventional approaches, multiple users accessing the same document for simultaneous 
collaboration each must retrieve a new copy of the entire document each time the original 
document is modified by another user. 

Accordingly, it would be useful if the efficiency of the present client-server system of 
structured document access could be improved. The present invention provides a system 
and method that provides such improved efficiency. 
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SUMMARY OF THE INVENTION 

The present invention provides a system and method of storage, retrieval, transmission, 
and presentation of structured hypermedia documents in client-server distnbuted 
computing systems. In the preferred embodiment, a network server maintains a persistent 
5 storage of an arbitrary number of processed structured documents. Each structured 
document is processed and parsed once at the network server, and the result is stored m 
compact tree (CT) form in a persistent object store. Tne CT form of the document then 
is delivered to requesting client computers as one or more object, When a document xs 
edited atacUentcom P uter,theCT form or the edit operations can be transferred from the 
10 client to the server. 

Since the document is retrieve in pre-paxsed and pre-processed form, the client compute, 
need no, have a document parsing function. The parsing operation is done once a, the 
server and thereafter the resulting representation may be accessed many times by 
different clients. The clients do not need to parse the compact reputation, because the 
, 5 r^tauonretrievrfbytheclientc^ 
to render the document for presentation. 

The compact tree techniques described above result in significantly smaller data transfers 
than traditional document transfer techniques. Tne actual bandwidth reductions depend 
upon the size and structure of the strucmred documents being transferred. Furthermore. 
» transferring compact tree representations incrementally provides additional band^dth 
and network performance Improvements. 

Since the CT representation and the original source representation of the document are 
essentially equivalent, the CT representation may replace the source represen.at.on, 
thereby saving storage space and cache memory space at the server. Similarly, sutce 
. client computers generally cache documents in source format, cache memory usage m the 
client computers is reduced. The CT representation also preserves cache storage space at 



BNSDOCID- <WO 9734240A1J_> 



PCT/US97/04574 

5- 

proxy servers that act as intermediate servers between the client computers and a remote 
server. 

Since the CT representation of a structured document reduces the time needed to access 
the document, the network server is able to sustain more sessions in a given interval. 
5 Because client computers do not need to parse the document, the clients are able to 
present the document more quickly than when using traditional document transfer 
techniques. 

Full-text and content-based retrieval are an important requirement for hypermedia 
document systems. The CT representation of structured documents separates the 
10 document's structural elements from its content, allowing the content to be stored and 
indexed by conventional full-text or content-based retrieval systems, while the structural 
portions can be queried by a structure-query processing language such as the HyQ query 
language [ISO 1992]. 

The CT representation also allows non-linear, partial-retrieval (incremental), and 
15 progressive access to documents. Since a document is stored on the server as a compact 
parse tree, client computers may access a subsection of the tree without retrieving and 
processing the entire tree. This speeds client access, particularly for large documents, 
since the compact tree can be partitioned into sub-trees deliverable in an arbitrary order, 
including non-linear and temporal ordering. Even though incremental access is useful in 
20 supporting non-linear and temporally-ordered access to portions of a document, 
conventional document transfer processes do not permit incremental, per document file 
access. Processing time also is reduced because the syntactical validation performed by 
the parser is performed only once at the server, rather than each time the document is 
accessed by a client, 

25 In collaborative situations, multiple client computers may simultaneously view, and may 
even modify, the same document. To maintain document consistency among all clients 
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during collaboration, changes made to the document at each client are propagated to the 
server and to the other clients. The object-oriented storage model of the document permits 
edit transaction objects to be incorporated into the incremental delivery of the document. 



Advantages of the invention may include one or more of the following: 
s . A computer network may be simplified by eliminating the need for document 
parsers in client computers requesting access to structured documents. 
A structured document may be parsed once by a network server and then accessed 
indefinitely by multiple client computers. 

Network storage and bandwidth requirements may be reduced by transferring a 
10 compact representation of a structured document instead of transferring the entire 

document itself. 

Network performance may be improved by reducing the time required to access 
a structured document 

Conventional information retrieval systems may be used to store and index the 
1S content of a structured document 

Modifications to a structured document may be sent to a client user who is 
viewing the document without retransmitting the entire document. 

The details of the preferred embodiment of the present invention are set forth in the 
accompanying drawings and the description below. Once the details of the invention are 
20 known, numerous additional advantages, innovations, and changes will become obvious 
to one skilled in the art 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a functional block diagram of a computer network that stores and transmits 
structured documents in a compact tree representation. 

FIGURES 2A and 2B are flow diagrams of a network server's processing of a structured 
s document. 

FIGURE 3 is a flow diagram of a network client's processing of a compact tree 
representation of a structured document. 

FIGURE 4 is a structural diagram of the elements of a document type definition (DTD) 
object for a compact tree representation of structured documents. 

10 FIGURE 5 is a schematic diagram of the elements of an instance object for a compact tree 
representation of a structured document. 

FIGURES 6A and 6B are block diagrams illustrating the storage and transmission format 
of a DID object. 

FIGURE 7 is a block diagram illustrating the storage and transmission format of the 
15 instance object. 

FIGURES 8A and 8B are block diagrams of a computer network that stores and transmits 
structured documents in a partitioned compact tree representation. 

FIGURES 9A and 9B are block diagrams of a computer network in which modifications 
to a structured document are transmitted between network computers. 
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FIGURES 10A and 10B are a document type definition (DTD) and an example SGML 
document generated according to the DTD, respectively. 

FIGURES 1 1 A and 1 IB are tables found in compact tree representations of the DTD and 
SGML document of FIGURES 10A and 10B. 

FIGURE 12 is another example SGML document generated according to the DTD of 
FIGURE 10A. 

FIGURES 13A and 13B are tables found in partitioned compact tree representations of 
the SGML document of FIGURE 12 and the corresponding DTD. 



Like reference 



numbers and designations in the various drawings indicate like elements. 
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DETAILED DESCRIPTION OF THE INVENTION 

Throughout this description, the preferred embodiment and examples shown should be 
considered as exemplars, rather than as limitations on the present invention. 

Referring to FIGURE 1 and FIGURES 2A and 2B, in a first embodiment of the present 
invention, a structured document 25 encoded according to a standard syntax, such as 
SGML [ISO 1988], may be shared among the various computers of a computer network 
20. In general, a network server computer 26 will receive a request for the document 25 
from at least one of a set of client computers 22 and 24 (Step 50). Upon receiving the 
request, the server 26 retrieves the document 25, either from an associated storage device 
27 if the server 26 is a computer system that permanently maintains the document 25 
{e.g. , the server 26 is an Internet server), or from another computer system if the server 
does not permanently maintain the document (e.g., the server 26 is a proxy server) (Step 
52). The document 25 is accompanied by a DTD file 28 that defines the syntactical 
elements used to convert the document 25 into a compact tree (CT) representation. The 
DTD file 28 allows the document to be reconstructed from the CT representation and to 
be presented as formatted output. 

After receiving the document 25, the server 26 uses an internal parser 30 to parse through 
the SGML document 25, breaking the document 25 into a parse tree 41 representing the 
document's structural hierarchy and a content list 43 representing the data contained in 
the document 25 (Step 54). The server 26 then converts the parse tree 41 and the content 
list 43 into structure and character data (CDATA) tables, respectively (Step 56). The 
structure tables store the document's structural information in compressed form, and the 
CDATA table stores the data content of the document 25 in compressed and indexable 
form. These tables are described in detail below. The server 26 then stores the structure 
and CDATA tables as a compact tree (CT) instance object 32 in a persistent data store 3 1 , 
such as an object-oriented database, an object-oriented file system, or an object-relational 
database (Step 58). The instance object 32 is structured such that related structural 
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elements (/ e, the document's elements and their corresponding statements) are located 
together in the compact tree, which allows the clients 22 and 24 and the server 26 to 
access the instance object, and therefore the stiuctured document, incrementally, as 
described below. If the server 26 includes an information retrieval system 35 (Step 60), 
the server 26 passes the CDATA portion of the instance object 32 to the information 
retrieval system 35 for storage (Step 62). The server 26 also compresses the DTD file 28 
into a compact DTD object 34 and stores it in the persistent object store 3 1 (Step 66), tf 
the server 26 has not already done so while previously processing a document of the same 
We (Step 64). The server 26 uses standard compression techniques to create the instance 
object 32 and the DTD object 34. 

foaead of delivering the enure SGML document 25 to the requesting client 22 or 24, the 
server 26 delivers only a copy 33 of the CT instance object 32 and, if necessary, a copy 
39 of the CT DTD object 34. If the client 22 or 24 requests only a portion of the 
document (Step 68), the server 26 sends only the subjects of the CT instance object 
32 relating to the requested portion (Step 70). Otherwise, the server 26 tiansmtts the 
entire CT inaance object to the requesting client 22 or 24 (Step 72). Furthermore, the 
server 26 transmits the DTD object to the client 22 or 24 (Step 76) only when the chen. 
specifically request* the DTD object 34 (Step 74). Since the DTD may be any SGML- 
conforming definition, the client generally will need to retrieve a copy 39 of the DTD 
. object 34 with the document instimce object 32. However, if the client 22 or 24 can 
process the document instance 32 without the DTD object 34, or if the client 22 or 24 
already has a copy 39 of the DTD object 34, then only a copy 33 of the document instance 
object 32 needbe transmitted to the client 22 or 24. The server 26 then waits for the next 
incremental or full-document request from the client 22 and 24 (Step 78). 

* If a document is structured according to a syntax that the server 26 does not recognize, 
the server 26 may follow one of several approaches, first, the server 26 may reject the 
document completely and notify the requesting client that the document could not be 
delivered. Second, if only a portion of the document is unrecognizable, the serve, 26 may 
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ignore the unrecognizable portion and transmit the rest Third, the parser 30 may replace 
the unrecognizable syntax with an acceptable syntax. Fourth, the server 26 may transmit 
the unrecognizable syntax to the client as character data (CD ATA). 

Referring also to FIGURE 3, after requesting the document or a portion thereof from the 
server 26, the client 22 or 24 receives a copy 33 of the CT instance object 32 or of the 
appropriate sub-objects (Step 82) and stores it in a local storage area 37, such as cache 
memory or a hard disk drive (Step 84). If the client 22 or 24 has not already received the 
DTD object 34 in response to an earlier request (Step 86), that client requests the DTD 
object 34 (Step 88) and, after receiving it, stores the object in the local storage area 37 
(Step 90). Using the DTD object as a guide, a CT processor 45 or 47 in the client 22 or 
24 then traverses the CT instance object (Step 92) and converts that object into a 
formatted document (Step 94). The client 22 or 24 then presents the formatted document 
to an output device 36 or 38, such as a video display or a printer (Step 96). 

In a second embodiment of the present invention, documents are pre-processed in the 
server 26 so that they are available in processed form before requested by a client. In a 
third embodiment of the present invention, as documents are requested by clients and 
processed in the server 26 as above, the processed form is stored in the server so that they 
are available in processed form for future requests by a client 

Referring to FIGURE 4, a CT DTD object 34 preferably is organized into several tables, 
each of which contains a portion of the DTD information needed for recreation 
(rendering) and presentation of the structured document. In the preferred embodiment, 
the tables in the DTD object 34 include a generic identifier (GI) table 102, a name table 
104, an attribute declaration table 106, an attribute enumeration table 108, a notation table 
110, an entity table 112, and a character data (CDATA) table 114. Preferably, DTD 
information that is needed only at parse time (e.g., the content model of each generic 
identifier) is discarded and is not placed into the DTD object 34. The name table 104 and 
the CDATA table 114 preferably consist of text strings; the other tables preferably consist 
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of consfcn. length numerical fields. In the preferred embodiment, me name table 104, 
notation tab,e . 10, entitle 1 .2, and CDATA .able 1 .4 are "common" »*. ma. may 
be shared by me DTD objec. 34 and me insunce objec, 32. The Gl .able .02, a«nbu,e 
declaration table 106, and attribute enumeration table .08 are DTD-specific tables. 

... • E, m »tlnn stored in the DTD object 34, each table preferably 
To amplify access to me trJbrmauonstorea in mew. j .„.„,-. 

has constant-size entries which may be identified by simple numeric md.ee, Chen 

con.pu.ers can find information in a tab.e by multiplying the entry index for the desued 

information by the predetermined si* of me entries. Variable lengm information such 

is suaed as consecutive entiles in either the name table .04 or the CDATA table „ 4. As 
a result, all variable lengm information may be represented by two numeric values: one 
value representee index of <he firs, table entiy con.aming a portion of .he desued 
information, and me other value representing the number of enries remnred » — the 

efficiency for access efficiency. One of ordinal skill in me art would recognrze ma. the 
may be designed differentiy to arike a different balance between space efficency 
and access efficiency. 

to Ute preferred embodiment me GI table .02 mamtains a lis. of 

,02 preferab.yco.tinnsa.leas, four fields for e«h generic identifier. The fi« field HO 
indicates me index of me generic identifier string smred in me name table 104. The 
secorrf field .22 indict me index of the firs, attribute stored in th. attribute declaration 
.able .06 for each generic identifier. As described be.ow, all attributes abated wtih 
a generic identifier are stored consecutively in the attribute declaration table . 06 

r ^ »ttrihi,te<: The third field 124 and fourth field 126 
fixed attributes followed by non-fixed attributes, l he unra 

of the GI table 102 indicate the number of fixed attributes and the number of non-fixed 
attributes, respectively, associated with each generic identifier. If desired, other 
information may be stored in the Gl table 102. 
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The name table 104 preferably stores a single copy of the text string associated with each 
generic identifier. Each string contained in the table 104 is represented by an index that 
identifies the position of the string in the table 104. The indices are represented by a 
binary value containing the least number of bits required to identify every generic 
identifier (/.<?., log 2 N, where N is the number of strings in the name table 104). By storing 
all identifier strings in the name table, the DTD object 34 reduces the storage space 
required to represent all string identifiers. 

The attribute declaration table 106 maintains a list of the attributes associated with each 
generic identifier, their types, and their default values. The table 106 preferably includes 
at least three fields: a name field 128, a type field 130, and a value field 132. The name 
field 128 indicates the index of the attribute name stored in the name table 104. The type 
field 130 indicates whether the attribute is an identifier (ID), an IDREF, a name, an entity 
reference, an enumeration, or CDATA. If the attribute is an identifier (ID or IDREF), the 
value field 132 stores the index of the identifier string in the name table 104. If the 
attribute is an enumeration, the value field 132 indicates the index of a list of possible 
enumeration values stored in the attribute enumeration table 108. If the attribute is 
CDATA, the value field 132 indicates the index of the default value in the CDATA table 
114. Because this information is stored in the attribute declaration table 106, the 
structured document and the CT instance object may contain a list of attributes that differ 
from their default values. The attribute declaration table 106 also eliminates the need to 
store information about fixed attributes in the document instance object 

The attribute enumeration table 108 maintains a list of possible enumeration values for 
each attribute declared in the attribute declaration table 106. The attribute enumeration 
table preferably contains at least two fields. An NameTable Entry field 133 stores the 
index of a first enumeration value in the name table 1 04. A No. of Entries field 134 stores 
the number of possible enumeration values. These enumeration values are stored 
consecutively in the name table 104, starting at the index stored in the NameTable Entry 
field 133. Optionally, an index field 135 is also provided to reference multiple entries in 
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the attribute enumeration table 108. (Indeed, throughout these examples, all of the index 
columns are shown for clarity but are not explicitly part of the CT representation). 

The notation table 110 preferably includes at least three fields: a name field 140, a type 
field 142, and a value field 144. The name field 140 stores an index to the notation 
5 identifier'string stored in the name table 104 for each notation. The type field 142 
contains a flag (preferably a single bit) for each notation that indicates whether the 
notation is a system or public notation. The value field stores an index to the correspond- 
ing notation value in the CDATA table 1 14. 

The entity table 1 12 preferably consists of at least four fields: a name field 146, a type 
10 field 148, a notation field 150, and a data field 152. The name field 146 stores an index 
to the entity identifier string stored in the name table 1 04 for each entity. The type field 
148 indicates whether the entity is notation data (NDATA), character data (CDATA), or 
specific character data (SDATA), each of which is known in the art. If the entity type is 
NDATA, the notationfield 150 stores an index to the corresponding notation string m the 
15 „ 0 tauontablellO;omerwise,men^ 

152 stores an index to a corresponding entity definition in the CDATA table 1 14. 

The CDATA table 1 14 preferably stores unparsed character data, such as notation values, 
external entity values, and attribute values. Like the name table, the CDATA table 1 1 4 
stores each variable length string in consecutive constant-length entries that may be 
20 indexed easily. After the DTD object 34 is fully constructed, the CDATA table 1 14 is 
compressed using a conventional text compression technique. 

Referring to FIGURE 5, a CT instance object 32 also is composed of several tables, each 
of which contains a portion of the information required to reconstruct the document 
instance. In the preferred embodiment, the tables specific to the instance object 32 include 
25 at least an instance table 1 60, an attribute value table 162, and a child table 1 64, each of 
which is described below. These tables describe the document-specific element hierarchy 
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and identify any attributes that are not set to the corresponding default values. The 
instance object 32 also may use the four common tables in the DTD object (/. e. , CD ATA, 
name, notation, and entity). The instance object 32 preferably will include an entity table 
(i.e., will use the common entity table) only when new external entities are used in the 
document instance. Likewise, the instance object 32 will include a notation table, GI 
table, or CDATA table only when new notations, identifiers, or CD ATA, respectively, 
are used in the document instance. The index values for the instance object elements in 
the common tables are determined by treating each table as a continuation of the 
corresponding table in the DTD object. 

The instance table 160 indicates the elements occurring in the document instance. The 
instance table 160 preferably includes at least five information fields for each element: 
a generic identifier (GI) field 166, a flag field 168, a first attribute field 170, a first child 
field 172, and a number-of-children field 174. The GI field 166 stores an index to a 
corresponding generic identifier in the GI table 102 of the DTD object The flag field 168 
contains a binary bit vector in which each bit represents a non-fixed attribute in the 
corresponding element. If the bit corresponding to a particular attribute is set, that 
attribute does not have the default value, but rather its value is determined by the value 
found in the first attribute field 170. The first attribute field 170 stores the index of the 
first non-default attribute value stored in the attribute value table 162 for the 
corresponding element. The attributes for which the corresponding bits in the flag field 
168 are set are stored consecutively in the attribute value table 162, beginning at the 
index stored in the first attribute field 170. All attributes which are set to the correspond- 
ing default values are represented by cleared bits in the flag field 168 and are omitted in 
the attribute value table 162. The bits in the flag field bit vector and the attribute values 
in the attribute value table 162 appear in the same order that the non-fixed attributes for 
the corresponding generic identifier appear in the attribute declaration table 106. The first 
child field 172 stores the index of the corresponding element's first child element (or sub- 
element) in the child table 164. The number-of-children field 174 indicates how many 
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children sub-elements the corresponding element has. All sub-elements of an element are 
stored consecutively and in proper hierarchical order in the child table 164. 

The attribute value table 162 stores at least a value for every attribute that is not set to the 
corresponding default value. Each entry in this table consists of one value, an index to the 
identifier string in the name table 104 representing the default attribute value. The 
attribute name and attribute type need not be stored in this table because this information 
is stored in the attribute declaration table in the DTD object. 

The child table 164 preserves the hierarchical structure of the elements occurring in the 
document instance. Each entry in the child table 164 is represented by at least two 
information fields: a child field 176 containing a single bit indicating whether the entry 
is itself an element or is a pseudo element, and a value field 178 storing an index to the 
instance table 160 if the entry is an element or an index to the CDATA table 1 14 if the 
entry is apseudo element. A child element contains everything located between two start 
and end tags found within the parent element, while the corresponding pseudo elements 
contain the content data located before the start tag and after the end tag, respectively. 

Each field in the tables of the instance object 32 preferably contains the smallest number 
of bits possible to present the required information. For example, the GI field 166 of the 
instance table 160 should contain only enough binary bits to represent the highest index 
in the GI table 102 of the DTD object Similarly, the attribute value table 162 and the 
child table 164 are large enough only to index the respective tables in the DTD object In 
the preferred embodiment the bit vectors in the flag field 168 of the instance table 160 
must include one bit for each non-fixed attribute in the element type having the most non- 
fixed attributes. The numbers-children field 1 74 must be large enough only to indicate 
the number of sub-elements in the element having the most sub-elements. Each child 
table entry must have enough bits to store the highest index value for the CDATA table 
1 14 and the instance table 160, plus an additional bit to indicate the child type. Each entry 
in the attribute value table 162 must be large enough to store the highest index in the 
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name table 104, CDATA table 1 14, entity table 1 12, or attribute enumeration table 108, 
whichever is larger. The entries in the attribute value table 162 may have more bits than 
the value field 132 of the attribute declaration table 106 because the document instance 
may have added additional information to the name table 104 and the CDATA table 1 14. 

The last entry 180 in the instance table 160 is unique in that it provides special 
information required by client computers to recreate the document instance. This entry 
preferably has at least one sub-element. Additional elements are pseudo elements 
containing any SGML processing instructions defined in the DTD. SGML processing 
instructions, which are known in the art, provide system specific information that must 
be used to reconstruct the document instance and therefore must be included in the 
instance object. The SGML processing instructions are followed by the first child that is 
a true element (as opposed to a pseudo element), which indicates the top level element 
of the document Any remaining children of the last instance table entry 1 80 may be used 
by the server to pass optional parameters, such as PI entity definitions, that help the server 
and the client optimize or customize delivery of the document instance. 

The entries in the document instance table 160 are ordered according to a recursive, 
depth-first search of the parse tree. As a result, entries representing components of the 
same element occupy consecutive locations in the name table 104, entity table 112, 
CDATA table 1 14, instance table 160, attribute value table 162, and child table 164. Also, 
the entries representing a child element preferably appear before the entries of subsequent 
children of the same parent element Furthermore, the entries representing an element's 
children preferably immediately precede the element's own entries in the document 
instance table 160. 

Referring to FIGURES 6A and 6B, the server stores and transmits the DTD object in a 
format understood by the client computers. The preferred format of the DTD-specific 
portion 200 of the DTD object is shown in FIGURE 6A. The server first stores a fixed 
length integer field 202 representing the number of enumerations contained in the 



WO 97/34240 



-18- 



PCTAJS97/04574 



10 



15 



20 



25 



attribute enumeration table 108 (FIGURE 4). The server then stores a bit-packed version 
of the attribute enumeration table 108 in a variable length field 206. Following the 
attribute enumeration information is a fixed length integer field 208 indicating the number 
of entries in the attribute declaration table 106 (FIGURE 4) and a variable length field 
210 containing abit-packed version of the attribute declaration table 106. The server then 
stores a fixed length field 212 indicating the number of entries in the generic identifier 
(GI) table 102 (FIGURE 4), followed by a variable length field 214 containing a bit- 
packed version of the GI table 1 02. 

The preferred format of the common portion 220 of the DTD object (/.*., the portion that 
may contain document instance information as well as DTD information) is shown m 
FIGURE 6B. The first two fields are a fixed length integer field 222 indicating the 
numberofbytes in the compressed CDATA table 114 (HGURE 4) and a variable length 
field 224 containing the compressed CDATA table 1 14. The next two fields are a fixed 
length integer field 226 indicating the number of entries in the name table 104 (FIGURE 
4) and a variable length field 228 containing the strings of the name table 104. These 
fields are followed by a fixed length integer field 230 indicating the number of entries m 
the notation table 1 10 (FIGURE 4) and a variable length field 232 containing a bit-packed 
version of the notation table 1 10. The last two fields of the common portion 220 of the 
DTD object are a fixed length integer field 234 indicating the number of entries in the 
entity table 112 (FIGURE 4) and a variable length field 236 containing a bit-packed 
version of the entity table 112. 

Referring to FIGURE 7, the server also stores and transmits the instance object 32 in a 
format understood by the client computers. The server preferably first stores a fixed 
length integer field 242 indicating the number of entries in the attribute value table 1 62 
(FIGURE 5), followed by a variable length field 244 containing a bit-packed venaon of 
the attribute value table 162. The next two fields include a fixed length integer field 246 
indicating the number of entries in the child table 1 64 (FIGURE 5) and a fixed length 
integer field 248 indicating the number of entries in the instance table 160 (FIGURE 5). 
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These fixed length fields are followed by two variable length fields, the first field 250 
containing a bit-packed version of the child table 164, and the second field 252 containing 
a bit-packed version of the instance table 160. As discussed above, the last entry in the 
instance table 160 identifies the DTD object corresponding to the instance object 32 and 
provides instructions for reconstructing the document- instance. The client computers 
automatically retrieve the last entry in the instance table 160 to begin the reconstruction 
process. 

Referring to FIGURES 8A and 8B, the server 26 may create a CT instance object 32 in 
a manner that allows incremental delivery of the instance object 32 in either linear or non- 
linear order. Incremental delivery permits a client to retrieve and present portions of a 
document that are needed immediately for viewing regardless of where they occur in the 
document For example, the client may need to begin its presentation of a document with 
a hyperlink that occurs in the middle of the document. Incremental delivery also permits 
a progressive-style display of a document, such as displaying all of the document's main 
headings before displaying its subheadings and body. Incremental access also allows the 
client to retrieve additional portions of a document only as the user attempts to view 
them, so that the server must transfer only those portions of the document that are needed 
by the user. This provides more optimal use of system resources when, for example, the 
user hyperlinks to a new document before entirely viewing the current document. 
Incremental delivery also allows the client to access portions of a temporally organized 
document, so that parts of a document can be retrieved in a specific time order. 
Incremental transfer is important for hypertext access in which the user is likely to browse 
through documents without viewing their entire contents. 

As shown in FIGURES 8 A and 8B, the server 26 creates an instance object 32 ready for 
incremental delivery by effectively dividing the associated compact tree into sub-trees. 
The server 26 divides the compact tree by partitioning the tables in the CT instance object 
32. Because all of the information about an element and its children is stored as 
consecutive entries in the instance object tables, the instance table may be partitioned 
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easily. To do so, the parser 30 generates an index 260 of element boundaries that 
indicates the first and last entries in each instance object table containing information 
about each element and its corresponding sub-elements. The index 260 of element 
boundaries preferably consists of at least three sub-indices: a first sub-index 262 that 
indicates the index bounds of information stored in the attribute value table 1 62 (FIGURE 
5) for each element and its corresponding sub-elements; a second sub-index 264 that 
indicates the index bounds of child information stored in the child table 164 (FIGURE 
5) for each element; and a third sub-index 266 that indicates the index bounds of entries 
in the document instance table 160 (FIGURE 5) for each element and its sub-elements. 
Alternatively, the server 26 may divide the document into a uniform set of equally-sized 
partitions without regard to corresponding elements and sub-elements. For a temporally 
scheduled document, the server may partition the document instance object 32 into sub- 
trees that preserve the time order in which the portions of the document must be presented 
to a client 

When a client requests incremental delivery of a document, a document delivery engine 
49 in the server 26 uses the index 260 of element boundaries to determine which entries 
from the three instance object tables (attribute value, child, and instance) must be sent to 
the requesting client Instead of sending each of the three tables in its entirety along with 
a fixed length integer indicating the size of the table, the document delivery engine 49 
delivers the requested range of entries along with a pair of fixed length integers, one of 
which indicates the index of the first entry in the delivered table fragment, and the other 
of which indicates the total number of entries in the delivered table fragment When the 
first sub-tree of the instance object is sent to the client, a third integer is transmitted by 
the server 26 to indicate the total size of the corresponding table, which allows the client 
to reserve enough memory space to receive all remaining fragments of the three tables, 
if necessary. The first two integers (i.e., index of the first entry and the number of entries 
in the table fragment) are used by the client to place the corresponding table fragment in 
the proper position in the client's copies of the tables. 
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In general, the name table 104, notation table 1 10, and entity table 1 12 (/.e., the common 
tables which may be shared by the DTD and instance objects) cannot be partitioned 
because the entries in these tables may be shared among sub-trees. However, the 
invention does not exclude the possibility that the CDATA table can be partitioned. 
Hence, there is the possibility that during increment transfer mode, the client computer 
may not receive the first CDATA partition that contains the desired DTD name (Le., the 
client may get some other partition first if, in some cases, the CDATA table is being 
partitioned). To deal with this case, a convention is used such that, during incremental 
transfer mode, the server will send the DTD name to the client right before the table 
boundaries are sent; this only needs to be done for the first partition. 

When a client requests incremental delivery of an instance object that shares these tables 
with the DTD object, the server 26 may select between two alternative approaches. In the 
preferred embodiment, a single-bit flag stored in a flag register 270 is associated with 
each table to indicate which of the two approaches the server should use. If the flag bit 
is set, the table is small enough that the server 26 may send the entire table with each sub- 
tree. If the flag bit is cleared, the table is too large to be sent with each sub-tree, so the 
server 26 must select only those table entries that are needed to decode the particular sub- 
tree. In the latter situation, the server 26 transmits the numeric index of each entry 
selected from the table. The server 26 sets or clears the flag bits based upon the values 
of the integers associated with each table indicating the table size. 

When the common tables are too large to send with each sub-tree, the server 26 may 
determine which entries to send to the client in one of two ways. First, the server 26, 
when it parses a document instance 25, may create and store a record indicating which 
entries in the common tables correspond to each partition in the DTD object 34. Second, 
when the server 26 creates a partition to deliver to the client, the server 26 may parse the 
portion of the original document 25 corresponding to the partition to determine which 
elements of the name table 104, notation table 110, and entity table are used in the 
partition. One of ordinary skill will recognize that the first alternative is preferred in a 
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network that is more sensitive to increased processing overhead in the server during 
document delivery, and that the second alternative is preferred in a network that is more 
sensitive to increased storage overhead in the server. 

Unlike the other three common tables, the CDATA table 1 14 can be partitioned since 
each P ic« of content data belongs to a unique element of the CT instance object 32. To 
support incremental delivery of documents, the server 26 independently compresses each 
entry in the CDATA table 1 14 and separately delivers each entry with the corresponding 
instance object sub-tree when incremental access is requested. This division of the 
information in the CDATA table should correspond to the partitioning of the document 
into sub-trees for depth first ordering. 

To further improve incremental retrieval, the top elements of the document tree may be 
placed in a separate partition to which other sub-trees are attached. In large documents, 
the lower-level sub-tiees also may be divided in a similar fashion. Sub-tree division ts 
best performed in a DTD-specific way in order to use knowledge of the document 
15 structure for optimization. 

In response to a client's request for incremental delivery of a document, the server 
t^anoftheer^tatheUghest-levelsu^^ 

level structure of the document and requests only the sub-trees it requires. The cUent may 
nse conventional addressing mechanisms such as SGML IDREF or HyTime location 
» addressing forms to identify any element inaanc. within a document When tire client 
returns a «quest for a specific element, the server uses the address of the rcquesKd 
element to ere*, an instance object sub-tree containing information for the requested 
element and its corresponding sub-elements. 

Progressive views of the document can be provided to the client by passing consecutive 
25 partitions of me same mot sub-tree incrementally, but with successively increasing deprh. 
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Alternatively, a root sub-tree having N levels can be delivered as a partition to the client, 
and the client can present the sub-tree progressively to the N levels. 

Referring again to FIGURE 1, the server 26 may be a proxy server acting as an 
intermediate server between a conventional server and its client computers, including 
clients located behind a "firewall" and mobile clients. The proxy server may serve as a 
bridge to improve system security or system performance. If a remote server passes a 
native encoding document 25 to the proxy sever, the parser 30 in the proxy server will 
convert the document 25 into the compact tree (CT) format and then forward the CT 
document to the client. 

Referring to FIGURES 9A and 9B, client computers 302 and 304 may engage in 
simultaneous collaborative author-mode access to a document stored as a CT instance 
object 306. At the same time, each client 302 and 304 may hold similar copies 308 and 
310 of the CT instance object 306 for simultaneous viewing and modification. When one 
client 302 modifies its copy 308 of the instance object by deleting, inserting, appending, 
or replacing information, a document editor 3 1 2 in the client computer 302 creates an edit 
object 320 that stores the modifications. The edit object 320 preferably includes one fixed 
length integer field 322 containing a time-stamp for the modifications and another fixed 
length integer field 324 indicating the address of the instance object sub-tree affected by 
the modifications. The edit object 320 also preferably includes a table 326 having at least 
two fields, the first 328 of which indicates the index of each element in the instance 
object that was modified, and the second 330 of which identifies the modification that 
was made. 

The modifying client 302 sends the edit object 320 to the server 300, which in turn sends 
the edit object 320 to the other client 304. A document editor 3 14 in the server 300 then 
uses the edit object to modify the instance object 306 accordingly. Likewise, a document 
editor 3 16 in the other client 304 uses the edit object 320 to modify its copy 310 of the 
instance object accordingly. Edit objects may be used with incremental delivery. 
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Example 

FIGURES 10A and 10B show a document type definition (DTD) 352 and an SGML 
document 350 generated according to the DTD 352, respectively. The DTD 352 defines 
a slide show presentation with time dependent information. According to the DTD 352, 
each slide show presentation is made up of one or more slide elements 354. Each slide 
354 in turn consists of one or more items 356 (or child elements), each item being either 
an image element 358 or an audio element 360. Each slide 354 also has two attributes: 
a "name" attribute 362 that uniquely identifies the corresponding slide, and a "next" 
attribute 364 that indicates the name of the next slide in the presentation. 

Each item element 356 (image or audio) has two attributes providing temporal 
information about the presentation of the element A "start" attribute 366 indicates an 
amount of time to delay presentation of the item after the corresponding slide presentation 
has begun. When the "start" attribute 366 is set to the default value 368 of zero, the image 
or audio element is presented precisely when the presentation of the slide begins. A 
"units" attribute 370 indicates whether the "start" attribute 366 is measured in seconds 
372, minutes 374, or hours 376. According to the DTD 352, "seconds" is the default 
value 378 of the "units" attribute 370. 

Image elements 358 and audio elements 360 each include a data attribute 380 and 381, 
respectively, that is an SGML external entity, such as an image file or an audio file. The 
DTD 352 defines a notation 382 for image elements ("gif ' represents the ".gif format 
for image data) and a notation 384 for audio elements ("ulaw" represents the ".ulaw" 
format for audio data). Both notations 382 and 384 are defined as "SYSTEM" type 
notations. Image elements 358 also include an "x" attribute 386 and a "y" attribute 388, 
which define the x-y location of the image in the slide presentation space. Both the "x" 
and "y" attributes 386 and 388 have default values 390 and 391 of zero. 

In FIGURE 10B, the document instance 350 declares that the "slideshow" document is 
defined by the document type definition "slideshow.dtd" 464, shown in FIGURE 10A. 
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The document instance 350 also declares a notation and five entities not declared in the 
DTD 352. The declared notation 464 ("postscript") is assigned the Adobe PostScript 3.0 
format 466 ("PS-Adobe-3.0"). The first two entities, an "audio- 1" entity 468 and an 
"audio-2" entity 470, are "NDATA" entities associated with two audio data files, and 
"audio- l.uiaw" file 472 and an "audio-2.ulaw" file 474, respectively. The other three 
entities, an "image-1" entity 476, an "image-2" entity 478, and an "image-3" entity 480, 
are "NDATA" entities associated with three image data files, an "image-1. giP file 482, 
"doc.ps" 486, and an "endgif ' file 484, respectively. 

After the document type and entity declarations, the document instance 350 defines the 
highest-level element in the document hierarchy, the "slideshow" element 488, which has 
two sub-elements, a first "slide" 490 and a second "slide" 492. The first "slide" element 
490 has two attributes, a "name" attribute 494 that indicates the name of the first "slide" 
element 490 ("first* *) and a "next" attribute 496 that indicates the name of the second slide 
element Because no slide follows the second "slide" element 492, the second "slide" 
: element 492 has only a "name" attribute 498 indicating the name of the second slide 492 
("end"). 

The 'Yirst" slide 490 consists of four items elements, two audio elements 500 and 506, 
and two image elements 502 and 504. The first audio and image elements 500 and 502, 
which include the "audio-1" and "image-1" entities, are displayed simultaneously with 
the beginning of the first slide 490 since no delay period is specified. The second image 
element 504, which includes of the "image-2" entity, is displayed two seconds after the 
first image element 502 is displayed (Le. t the "start" attribute 507 for the second image 
504 has a value of "2", and the "units" attribute 508 has a value of "seconds"). The 
second audio element 506, which includes of the "audio-2" entity, begins 130 seconds 
after the beginning of the "first" slide 490 (£e., the "start" attribute 510 for the second 
audio element has a value of "130"). 
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The "end" slide 492, which is presented after the "first" slide 490, consists of a single 
image element 512. This image element 512 includes the "image-3" entity and is 
displayed simultaneously with the beginning of the "end" slide presentation. 

FIGURES 11 A and 1 IB show the tables that make up the compact trees (CT) for the 
DTD 352 and document instance 350. The CDATA table 392, the name table 394, and 
the notation table 396 are common tables containing data representing both the DTD 352 
and the document instance 350. The other tables shown in FIGURE 1 1 A contain data 
representing only the DTD 352, and the tables in FIGURE 1 IB contain data representing 
only the document instance 350. 

The name table 394 ofFIGURE 1 1 A includes 18 entries corresponding to the DTD 352 
ofFIGURE 10A and 8 entries corresponding to the document instance 350 ofFIGURE 
10B. The first entry (index 0) in the name table 394 is a null string, which is associated 
with objects that do not have a name (e.g.. the last entry in the document instance table). 
The next 17 entries are entered according to a depth-first recursive pass through the DTD 
352. Therefore, elements appearing at lower levels of the DTD hierarchy appear first in 
the name table 394, while elements appearing at higher levels of the DTD hierarchy 
appear later in the name table 394. For example, the "audio" element 360 and the "image- 
element 358 are the lowest level elements in the DTD 352, so the corresponding entries 
in the name table 394 (index 1 and index 3, respectively) appear before the entries 
representing "item" element 356 (index 6), "slide" element 354 (index 12), and 
"slideshow" element 353 (index 15). Likewise, the attributes associated with each 
element immediately follow the element name in the name table 394, unless a similarly 
named attribute appears higher in the table. For example, the entry representing the "data- 
attribute 381 of the "audio" element 360 immediately follows the name table entry for the 
"audio" element 360, but is not duplicated below the entry for the "image" element 358, 
which also includes a "data" attribute 380. Likewise, entries corresponding to the "x" 
attribute 386 and "y" attribute 388 of "image" element 358 immediately follow the entry 
for me "image" element 358. The last two entries (indexes I6and 17) in the DTD portion 
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of the name table 394 are associated with the "gif" and "ulaw" notations 382 and 384 
declared in the DTD 352. 

To reconstruct the original SGML document 350 from the tables of the compact tree, the 
computer requesting the document first reads the last entry (index 13) in the document 
instance table 450 to identify the corresponding document type definition and the location 
of the top level element (the "slideshow" element 488) in the document instance table 
450. Because the last entry in the document instance table 450 does not refer to an 
element of the document, no generic identifier or attribute is associated with the entry. 
Therefore, the corresponding "GI" and "first attribute" column entries are not examined. 
The client computer obtains the name of the required DTD from the last entry in the 
CDATA table 392 (index 12). The last entry in the document instance table 450 has at 
least one corresponding entry in the child table 444, one of which identifies the location 
of the highest level element in the document instance table 450. In this case, the last entry 
in the document instance table 450 includes exactly one child element, which is stored 
as the last entry in the child table 444 (index 12). This child element indicates the location 
(index 12) of the highest level element ("slideshow") of the compact tree in the document 
instance table 450. 

The computer then reads the document instance table entries for the highest level element 
to gather information about that element and its children elements. The "GI" field 452 
contains a value of "1", indicating that the corresponding generic identifier information 
is located in the second position (index 1) of the GI declaration table 416. The "identifier" 
field418 in the GI declaration table contains a value of "15", indicating that the name of 
the highest level element ("slideshow") is located in the sixteenth entry (index 15) in the 
name table 394. The "first attribute" field 420 in the GI declaration table contains a value 
C'8") greater than the highest index of the attribute declaration table 408, which indicates 
that the "slideshow" element has no corresponding attributes. The requesting computer 
then returns to the document instance table and finds that because the "slideshow" 
element has no corresponding attributes, the "flags" field 454 has no bits set and the "first 
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attribute" field 456 contains an invalid index value ("10"). The "first child" field 458 and 
the "number of children" field 460 indicate that the "slideshow" element has two sub- 
elements at the eleventh and twelfth positions in the child table 444. The "value" field 
448 of the child table 444 indicates that information for the first of these sub-elements is 
located in the ninth position (index 8) of the document instance table 450 and that 
information for the second sub-element is located in the twelfth position (index 1 1 ) of the 
document instance table 450. 

Reading the information contained in the ninth entry of the document instance table 450, 
the computer learns that the generic identifier for the sub-element is contained in the fifth 
position (index 4) of the GI declaration table 416. This entry in the GI declaration table 
416 points the computer to the thirteenth position (index 12) of the name table 394, which 
indicates that the sub-element is a "slide" element The attribute fields 420, 422, and 424 
of the GI declaration table 41 6 indicate that each slide element has two associated unfixed 
attributes that are identified by the seventh and eighth entries (indices 6 and 7) in the 
"attribute" declaration table 408. The first attribute of each "slide" element is an IDREF- 
type attribute, the name ("next") of which is contained in the fifteenth entry (index 14) 
in the name table 394. The "next" attribute has no default value, as indicated by the null 
string contained in the first entry (index 0) of the CD ATA table 3 92. The second attribute 
of the "slide" element is an ID-type element, the name of which ("name") is contained 
, in the fourteenth entry (index 13) in the name table 394. The "name" attribute also has 
a null default value. 

After identifying the "name" and "next" attributes associated with the "slide" element, 
the computer returns to the ninth entry of the document instance table 450 and reads the 
values of the bits in the corresponding entry of the "flags" field 454. Because each of the 
first two bits is set, the computer knows it must retrieve the actual values of the "name" 
and "next" attributes from the attribute value table 440. The "first attribute" field 456 
directs the computer to the seventh and eighth positions (indices 6 and 7) of the attribute 
value table 440, which in turn refers the computer to the twenty-fourth and twenty-fifth 
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positions (indices 23 and 24) of the name table 394. From the name table, the computer 
learns that the "name" of the first slide element is "first" (index 24) and that the "next" 
slide element is the "end" slide element (index 23). The "first child" and "number of 
children" fields 458 and 460 indicate that the "first" slide element has four sub-elements, 
which are listed sequentially beginning at the fifth position (index 4) of the child table 
444. The computer then accesses the value field 448 of the child table 444 to leam that 
information for the four sub-elements of the "first" slide element is contained in the 
second, fourth, sixth, and eighth entries (indices 1, 3, 5, and 7) of the document instance 
table. 

The computer next accesses information for the first sub-element of the "first" slide. The 
corresponding entry in the "Gr field 452 of the document instance table 450 indicates 
that the generic identifier is identified in the first entry ( index 0") in the GI declaration 
table 416. The "identifier" field 418 of the GI declaration table 416 in turn directs the 
computer to the seventh entry (index 6) of the name table 394, which identifies the first 
sub-element as an "item"element. The GI declaration table 416 then directs the computer 
to the fifth and sixth entries (indices 4 and 5) in the attribute declaration table 408 for 
information about the two unfixed attributes associated with the "item" element. The first 
attribute is a CDATA attribute, the name of which ("start") is located in the twelfth 
position (index 1 1) of the name table 394. The "start" attribute has a default value of "0" 
as indicated by the "default" field 414 of the attribute declaration 408 and the second 
entry (index 1) of the CDATA table 392. The second attribute of the "item" element is 
the "units" attribute, as indicated by the "name" field 410 of the attribute declaration table 
408 and the eighth entry (index 7) in the name table 394. The "units" attribute has three 
possible values, "seconds", "minutes", and "hours", which are referenced by the 
enumeration table 404 as the ninth through eleventh positions (indices 8-10) of the name 
table 394. The default value of the "units" attribute is "seconds". 

After identifying the attributes associated with the "item" element, the computer returns 
to the document instance table 450 and skips the "first attribute" field since none of the 
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bits in the "flags" field are set The computer then reads information from the "first child" 
field 458 and the ."number of children" field 460 to learn that the first "item" element has 
one sub-element identified by the first entry (index 0) in the child table. The "value" field 
448 of the child table 444 directs the computer to the first entry (index 0) in the document 
instance table 450 for information about this sub-element The "GI" field 452 of the 
document instance table 450 indicates that the generic identifier for the sub-element is 
identified by the third entry (index 2) of the GI declaration table 416. The "identifier" 
field 418 of the GI declaration table 416 indicates that this sub-element is an "audio" 
element as specified in the second position (index 1) of the name table 394. The "unfixed 
attribute" field 424 of the GI declaration table 416 indicates that the "audio" element has 
a single attribute which, according to the "first" field 420 of the GI declaration table 416, 
is identified by the first entry (index 0) of the attribute declaration table 408. The "name" 
field 410 of the attribute declaration table 408 and the third entry (index 2) of the name 
table 394 indicate that this attribute is a "data" attribute. This "data" attribute is an entity 
having a Null default value, as indicated in the "default" field 414 of the attribute 
declaration table 408 and the first entry (index 0) of the CDATA table 392. Because the 
first bit of the corresponding entry in the "flags" field 454 of the document instance table 
450 is set the computer must look to the attribute value table entry (index 0) identified 
in the "first attribute" field 456 of the document instance table 450 to learn the value of 
the "data" attribute. This entry in the attribute value table 440 directs the computer to the 
nineteenth entry (index 18) in the name table 394 (U.. the first name table entry 
corresponding to the document instance), which indicates that the name of the "data- 
attribute is "audio-1". Also, because the attribute is an entity, the computer accesses the 
first entry (index 0) in the entity table to determine which entity is associated with the 
"audio-1" attribute. The "type" field 434 of the entity table 430 indicates that the entity 
is of the type "NDATA", and the "notation" field 436 indicates that the entity associated 
with the "audio-1" attribute is defined in the second entry (index 2) of the notation table 
396. The "name" field of the notation table 396 indicates that the entity has the "ulaw" 
notation, as listed in the eighteenth entry (index 17) of the name table 394. The notation 
table 396 also indicates that the "audio-1" attribute is a SYSTEM type object, as indicated 
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in the 'Value" field 402 of the notation table 396, The "value" field 438 of the entity table 
430 then directs the computer to the fifth entry (index 4) of the CDATA table 392 for the 
name of the entity associated with the "audio- 1" attribute. The CDATA table 392 
specifies that the "audio-1 .ulaw" data file is associated with the "audio- 1 " attribute of the 
first "audio" element. The computer returns to the first entry (index 0) of the document 
instance table 450 and learns that the first "audio" element has no sub-elements. 

The computer then moves to the fourth entry (index 3) of the document instance table 
450, which represents the second sub-element of the first "slide" element. Like the first 
sub-element of the "slide" element, the second sub-element is an "item" element having 
two unfixed attributes, a "start" attribute and a 4t units" attribute. Because none of the bits 
in the corresponding entry in the "flags" field 454 of the document instance table 450 is 
set, both the "start" and the "units" attributes are set to the default values, which means 
that the second item element begins simultaneously with the first item element The "first 
child" field 458 and "number of children" field 460 indicate that the second "item" 
element also has a single sub-element, which is identified by the second entry (index 1) 
of the child table 444. Information for the sub-element is contained in the third entry 
(index 2) of the document instance table 450. 

The "GI" field 452 of the document instance table 450, the "identifier" field 418 of the 
GI declaration table 416, and the fourth entry (index 3) of the name table 394 indicate that 
the only sub-element of the second "item" element is an "image" element. The "unfixed 
attribute" field 424 of the GI declaration table 416 indicates that the "image" element has 
three unfixed attributes, which are listed consecutively beginning with the second entry 
(index 1) of the attribute declaration table 408. The first attribute of the "image" element 
is a "data" entity having a null default value. The other two attributes are the "x" and "y" 
values that indicate the "x-y" position of the "image" element in the presentation space. 
These attributes are identified in the "name" field 410 of the third and fourth entries 
(indices 2 and 3) in the attribute declaration table 408 and the fifth and sixth entries 
(indices 4 and 5) in the name table 394. The "default" field 414 of the attribute 
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declaration table 408 and the second entry (index 1) in the CDATA table indicate that 
both the "x" arid "y» attributes have a default value of "0". The "flags" field 454 indicates 
that the "x" and "y" attributes of the first "image" element have the default values, while 
the value of the "data" attribute of the first "image" element is specified by the second 
entry (index 1) of the attribute value table. The "value" field 442 of the attribute value 
table 440 and the twentieth entry (index 19) of the name table 394 identify "image-1" as 
the name of this "data" attribute. The second entry (index 1) of the entity table and the 
first entry (index 0) in the notation table 396 identify the "image-1" attribute as a 
SYSTEM type object of the ".gif ' file type. The "image-1" attribute is associated with 
the "image-1 .gif data file, as indicated by the sixth entry (index 5) of the CDATA table 
392. 

The computer then retrieves information about the other two sub-elements of the first 
"slide" element from the sixth and eighth entries (indices 5 and 7) in the document 
instance table 450. The computer learns that each of the sub-elements is a "item" element 
havingasingle sub-element The first of these "item" elements includes an "image" sub- 
element that is associated with the "PS-Adobe-3.0" data file and that has a presentation 
delay of two seconds. The second of these "item" elements includes an "audio" sub- 
element that is associated with the "audio-2.ulaw" data file and that has a presentation 
delay of 130 seconds- 

0 After me computer disp^^ 

it moves to the second "slide" element in the "slideshow" presentation. As discussed 
above, the second "slide" element is represented by information contained in the twelfth 
entry (index 1 1) of the document instance table 450. Because the second "slide" element 
ic he last element of the "slideshow" presentation, the first bit of the "flags" field 454 and 

25 the document instance table 450 is cleared and the second bit is set, indicating that the 
"next" attribute of the "slide" element has the null default value, while the "name- 
attribute has the value ("end") identified by the last entry (index 9) in the attribute value 
table 440. The second "slide" element has a single "item" sub-element, which in turn has 
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a single "image" sub-element associated with the "end.gif * data file. The third "image" 
element is associated with the "end.gif * data file and is displayed at the default "x-y" 
location (0,0) and with the default presentation delay of zero seconds. Once the second 
"slide" element has been presented, the requesting computer has fully reconstructed and 
displayed the "slideshow" document 

The first four entries of the CD ATA table 392 represent information in the DTD, so every 
CDATA field in one of the other tables in FIGURE 1 1 A must contain two bits to 
reference an entry in the CDATA table 392. Likewise, the first 18 entries of the name 
table 394 represent information in the DTD, so every name field in the other tables of 
FIGURE 11A must contain five bits to reference an entry in the name table 394. 
Therefore, each of the first two entries in the notation table 396, both of which represent 
information in the DTD, must be eight bits in length: five bits representing the index of 
the notation name 398 in the name table 394; one bit indicating the notation type 400 (i.e. , 
"SYSTEM" or "PUBLIC"); and two bits representing the index of the notation value 402 
in the CDATA table 392. Because the notation table 396 has two entries representing 
information in the DTD, only one bit is needed in the other tables of FIGURE 1 1 A to 
access information in the notation table 396. No entities are defined in the DTD of 
FIGURE 10A, so no entity table is shown in FIGURE 1 1 A. 

The DTD of FIGURE 10A defines only a single enumeration with three possible values. 
Therefore, the enumeration table 404 references only three entries, beginning at the index 
in the name table 394 indicated by a five-bit NameTable field 406. 

Each entry in the attribute declaration table 408 includes three fields: a five-bit name field 
410 representing the index of the attribute name in the name table 394; a three-bit type 
field 412 indicating which of the possible types (name, entity, CDATA, IDREF, ID, or 
enumeration) each attribute takes on; and a five-bit default value field 414 representing 
the index of the default value in the name table 394. Because the attribute declaration 
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table 408 has eight entries, every entty in one of th. other tab.es of FIGURE UA 
referencing the attribute declaration table 408 must contain four b.ts. 

to the preferred embodiment, each enhy in the GI declaration table 4,6 has 1! bits: five 
bits representing the generic identifier name 418 in the name tab.e 394; four bits 
anting the index of the first attribute 420 in the attribute declaration table 408; and 
two bits representing the number of unfixed attributes 424 (no GI in the DTD has more 
tnan three possible unfixed attribufcs). Because the DTD of FIGURE 10A defines no 

field 422. 

nine entries representing the document instance. Therefore, any CDATA fields in the 
docket instance tables ofFIGURE 11B must contain four bits to reference an entry . 
tie CDATA table 392. The nam. table 394 contains eight entries repreamfng the 
document usance, yielding 26 tota! entries in the nam. «ab.e 394. As a result, five*., 
fidds are used in the document instance tables ofFIGURE 1 IB to reference the name 
.able 394. The notation table 396 contains one entry reputing the document insttnee, 
in addition to the two DTD entries, so two-bit fields are used in the docutKnt insfcnee 
tables to reference the notation table 396. In the preferred embodiment, the entry m the 
notation table 396 representing the document usance uses 10 bits: five bits representing 
. th^ofthenotationn^gtother^^ 

type 400 C'PUBUC"), and four bits representing the index of the noution value 402 m 
the CDATA table 392. 

Because the document instance of FIGURE 10B defines five entities, the compact tree 
for the document instance includes an entity table 430. Entries in the other document 
a instancfcbleseontatatbxcbits to reference the five entries in<he entity table 430. Each 
entry in the entity table 430 itself contains 13 bits: five representing the index of the 
entity name 432 in the name table 394; two indicating the entity typ e 434 (NDATA . 
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SDATA, or CD ATA); two representing the index of the entity notation 436 in the 
notation table 396; and four representing the index of the entity value 438 in the CD ATA 
table 392. 

The compact tree for the document instance also includes an attribute value table 440, 
each entry of which includes a five bit attribute value field 442 representing the index of 
the attribute name in the name table 394. A child table 444 includes a one-bit field 446 
indicating the type of each child entry ("ELEMENT* or "PSEUDO-ELEMENT") and a 
four-bit field 448 representing the index of the child entry in either the CDATA table 392 
or the document instance table 450, discussed below. 

The document instance table 450 has 15 bits per entry. Five bits represent the index of the 
entry's GI 452 in the GI declaration table 416. A three-bit flag vector 454 indicates 
which, if any, of the three possible unfixed attributes are not set to the corresponding 
default value. For "slide" elements (GI table index of "4" and name table index of "12"), 
the first bit in the flag field 454 indicates whether the "name" attribute is set to the default 
value, and the second bit indicates whether the "next" attribute is set to the default value. 
The third bit is not used. For "item" elements (GI table index of "0" and name table index 
of "6*0, the first bit indicates whether the * t units" attribute is set to the default value, and 
the second bit indicates whether the "start" attribute is set to the default value. The third 
bit is not used For "audio" elements (GI table index of "2" and name table index of "1 "), 
the first bit indicates whether the "data" attribute is set to the default value, and the other 
two bits are not used For " image" elements (GI table index of "3" and name table index 
of "3"), the first bit indicates whether the "data" attribute is set to the default value, the 
second and third bits indicate whether the "x" and "y" attributes, respectively, are set to 
the corresponding default values. 

Each entry in the document instance table 450 also contains four bits representing the 
index for the attribute value table 440 entry holding the actual value of the first unfixed 
attribute 456 that is not set to the default value. Four bits represent the index of the 
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entry*sf^child458 inthe child table 444. Because no element in the document instance 
each entry in the instance table 450. 

compact tree reputation of the DTD consumes only 128 bytes. Dkew.se, the 
actU a. document instance file requires approximately 731 bytes of storage space, wlule 
the compact tree representation of the document instance uses only 183 bytes. 

Referring to FIGURE 12, a sample SGML document instance 550 general according 
to the DTD 352 of FIGURE .OA includes a top-l-el "slideshow" element 552 and three 
-Slide" sub.lem.nts: a«fest" slide element 554. a "second" slide element 556, and an 
-end" slid, element 558. The simple stiucmre of the document instance 550 lends the 
eo^ndingparse^narurallytoparationingateach 

tree also could be partitioned in other ways. 

564, an enumetation tabl. 566, an attribute declaration^ 568, a GI decoration 

of four partitions, or sub-trees, in me CT representation of the document maance The 

rop-level element of the document: the "slideshow" element 552 «l each of me sl.de 
suLements 554, 556, and 558. The nextpattition 584 ("first") in the element bounces 
25 580 includes only those sub— s associated win, the "first" shde element S* 

Likewise, me next partition 586 Csecond") includes only those sub.lem.nU assocated 
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with the "second" slide element 556, and the last partition 588 ("end") includes only 
those sub-elements associated with the "end" slide element 558. 

For each sub-tree, the element boundaries table 580 includes seven information fields: an 
"element" field 590 identifying the sub-tree; an "attribute value start" field 592 and an 
"attribute value end" field 594 identifying the indices of the first and last entries, 
respectively, in the attribute value table 576 corresponding to the sub-tree; a "child start" 
field 596 and a "child end" field 598 identifying the indices of the first and last entries, 
respectively, in the child table 578 corresponding to the sub-tree; and a "document 
instance start" field 600 and a document instance end" field 602 identifying the indices 
of the first and last entries, respectively, in the document instance table 572 correspond- 
ing to the sub-tree. 

When the requesting computer first requests partitioned delivery of the document, the 
CDATA table 564, the name table 560, the notation table 562, and the entity table 574 
are sent in their entirety, but only the portions corresponding to the <4 top" sub-tree 582 are 
sent for the attribute value table 576 (indices 10 through 14), the child table 578 (indices 
8 through 19), and the document instance table 572 (indices 16 through 20). When the 
requesting computer later requests a child element of one of the "slide" elements, only 
those entries in the attribute value, child, and document instance tables corresponding to 
that "slide" element are sent For example, if a child element of the "second" slide 
element 556 is requested, the requesting computer receives only the seventh through ninth 
entries (indices 6 through 8) from the attribute value table 576, the fifth through seventh 
entries (indices 4 through 6) from the child table 578, and the ninth through fourteenth 
entries (indices 8 through 13) from the document instance table 572. 

Implementation 

The method of the invention may be implemented in hardware or software, or a 
combination of both. However, preferably, the method of the invention is implemented 
in computer programs executing on programmable processors each comprising a 
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pressor, a data aorage system (inc.ud.ng v** and non-vo.ati.e memory and/or 
Lage dements), a, .east one input device, and at leas, one output devtce. 

Each program is preferab.y imp.emen.ed in a nigh .evel procedura. or object oriented 
be a compiled or interpreted language. 

Each such computer program is preferab.y stored on a sttmge media or de«ce 
R OM flash RAM, or magnetic diskeU.) readab.e by a genera, or specta. purpose 

inventive system may aiso be considered to be imp.emen.ed as a pressor- 
lage ml», configu^ with a computer program, where the su,rage med,um * 

the functions described herein. 

X - scope of me invent For e*amp.e, the CT instance object may have 
IL. JL tha, provide information such as character set — o, 
tuenciog of objects, time stamps, ownership, and other panmeters needed t. 

Iwn he K . Furthermore, stiucmred document syntaxes other than SGML may be 
Sol documents into the SGML syntax. Additional whi, an £ 
subscriber or broadcast mode, (ciien, request a subscription • mfonuauon, and * 
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or a server push model (a client registers for specific information, and the server from 
time to time sends such specific information without further request), or combinations of 
such models. 

Accordingly, it is to be understood that the invention is not to be limited by the specific 
5 illustrated embodiments, but only by the scope of the following claims. 
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CLAIMS 

What is claimed is: 

1 A computer network comprising: 

(a) a client computer that authorizes receipt of a structured document, and is 
capable of receiving such structured document; and 

(b) a server computer that, upon receiving at least an initial receipt authoriza- 
tion from a client computer, retrieves the structured document, parses the 
document into a structural portion and a content portion, and thereafter 
sends at least some of the two portions to the client computer. 



2 The computer network of claim 1 wherein the server also sends the client 
2 ' computer a document type definition corresponding to the requested document 



3. The computer network of claim 1 wherein the server compresses the structured 
portion of the document into^compact tree. 

4. The computer network of claim 1 wherein the server compresses the content 
portion of the document 

5 Thecomputernetworkof claim 1 former comprismg a processor in the client that 
traverses the structured portion and the content portion of the document to 
reconstruct the document. 

6. Thecomputernetworkof claim 1 wherein the structured portion of the document 
is arranged to allow incremental access by the client computer. 
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1 7. Hie computer networic of claim 1 wherein the structured portion of the document 

2 comprises a compact tree, at least one element of which comprises an associated 

3 sub-tree representing sub-elements hierarchically subordinate to the element, 

1 8 . The apparatus of claim 7 wherein the compact tree comprises a table defining the 

2 structural elements of the document. 

1 9. The computer network of claim 1 wherein the structured document adheres to a 

2 formalized syntax. 

1 10. The computer network of claim 9 wherein the syntax comprises SGML. 

1 1 1 . A computer data structure for storing structured document data in a persistent 

2 object storage of a network server computer, comprising a compact tree including 

3 a parse tree representing the structural hierarchy of the structured document, and 

4 a content list representing the data contained in the structured document 

1 12. A persistent object storage in a network server computer configured to store 

2 structured document data as a compact tree comprising a parse tree representing 

3 the structural hierarchy of the structured document, and a content list representing 

4 the data contained in the structured document 



BNSDOCID: <WO 97342 40A1_I_> 



WO 97/34240 



-42- 



13. A method of storing information representing a structured document in a 
persistent object storage of a network server computer, the method comprising the 
steps of : 

(a) parsing the structured document to form a parse tree representing the 
structural hierarchy of the structured document and a content list 
representing data contained in the structured document, and 

(b) storing the parse tree and the content list in the persistent object storage. 

14. The method of claim 13 further comprising the step of storing in the persistent 
object storage a document type definition corresponding to the structured 
document. 



15. 



The method of claim 13 further comprising the step of compressing the content 
list before storing it in the persistent object storage. 
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1 16. A method of sharing a structured document in a computer network, the method 

2 comprising the steps of : 

3 (a) transmitting a parse tree representing the structural hierarchy of the 

4 structured document and a content list representing data contained in the 

5 structured document from a network server computer to a network client 

6 computer, 

7 (b) reconstructing the structured document in the network client computer 

8 using the transmitted parse tree and content list. 

1 1 7. The method of claim 1 6 further comprising the step of transmitting a document 

2 type definition corresponding to the structured document from the network server 

3 computer to the network client computer. 

1 18. The method of claim 1 6 wherein the content list comprises compressed text data. 

1 1 9. The method of claim 1 8 further comprising the step of expanding the compressed 

2 text data after transmitting it to the network client computer. 

*• 20. The method of claim 16 wherein the parse tree and the content list are transmitted 

2 incrementally to the network client computer. 

1 21. The method of claim 1 6 wherein the parse tree is transmitted to the network client 

2 computer as partitioned sub-trees. 



BNSDOCID: <WO 9734240A1_I_> 



WO 97/34240 



-44- 



22. 



A method of displaying a structured document in a network client computer, the 
method comprising the steps of : 

(a) receiving from a network server computer a parse tree representing the 
structural hierarchy of the structured document and a content list 
representing data contained in the structured document, 

(b) reconstructing the structured document in the network client computer 
using the received parse tree and content list, and 

(c) displaying the reconstructed structured document on a display device 
attached to the network client computer. 
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1 23. A computer program, residing on a computer-readable medium, comprising 

2 instructions for causing a server processor to parse a structured document data 

3 into a compact tree comprising a parse tree representing the structural hierarchy 

4 of the structured document, and a content list representing the data contained in 
s the structured document 

1 24. A computer program, residing on a computer-readable medium, for representing 

2 a structured document in a persistent object storage of a network server computer, 

3 comprising instructions for causing a processor to: 

4 (a) parse the structured document to form a parse tree representing the 

5 structural hierarchy of the structured document and a content list 

6 representing data contained in the structured document, and 

7 (b) store the parse tree and the content list in the persistent object storage. 

1 25. A computer program, residing on a computer-readable medium, for sharing a 

2 structured document in a computer network, comprising instructions for causing 

3 at least one processor to: 

4 (a) transmit a parse tree representing the structural hierarchy of the structured 

5 document and a content list representing data contained in the structured 

6 document from a network server computer to a network client computer, 

7 (b) reconstruct the structured document in the network client computer using 

8 the transmitted parse tree and content list. 



BNSDOCID: <WO 9734240A1J_> 



WO 97/34240 

26. A computer program, residing on a computer-readable medium, for displaying a 

2 structured document in a network client computer, comprising instructions for 

3 causing a processor to: 

(a) receive from a network server computer a parse tree representing the 
structural hierarchy of the structured document and a content list 
representing data contained in the structured document, 

(b) reconstruct the structured document in the network client computer using 
the received parse tree and content list, and 

(c) display the reconstructed structured document on a display device 
attached to the network client computer. 
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<! element slide^how - - (siide+)> 
<! element slide - - (item+)> 
<!attlist slide 
— name id ^required 

5U«f next idref #implied 

> 

<! element item - - (audio I image) > 
<Iattlist item / ^-^^^L. 37^ 378 

?7D - — u nits (seconds, minutes, hours) seconds 

3fc(, -—start CDATA "0^^ e 

<! element audio - o empty> 
<!attlist audio 
36»| — data entity #reauired 

> " W** 

< ! element image - o empty> 

<!attlist image 

JSo data entity #required 

^eU— CDATA "0^ 3ro 



tgz—^ 11 n °tation gif system "gif"> 

notation ulaw system "ulaw"> 
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<!doctype slideshow system * x slideshow.dtd" «/ 

< 'notation postscript system w PS-Adobe-3 . 0"> 

<! entity audio- 1 system ~audiol . ulaw" ndata uiaw> 
<! entity audio-2 system "audio2 . ulaw" ndata uiaw> 

<! entity image-1 system "imagel . gif " ndata gif> 
<!entity image-2 system "doc. ps" ndata postscript> 
< ientity^image-3 system j^end . gif" ndata gif> 

,<slideshow> 4*4 41U 
:siide narne=first next=end> 

<item> 

►<audio data ss audio-l> 

</irero> 
<irem> 

_^-—*-< image data=image-l> 
S* oZ </item> 

<item unlts=seconds stan=2> 
^-—K image data=image-2> 
&° </item> ^ 

<item Suart«"130 M > 




s-*^ </item> Q \ 

</siide> 

w <slide name= s end> 350 



?2- 



<icem> 

image data=image-3> 

</item> 
</siide> 
</siideshow> 
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<!doctype slideshow system "slideshow . dtd" 

<» notation oostscript system "FS-Ariobe-3 . 0"> 

<• entity audio-l-I syster. "audiol.uxaw" ndata ulav> 

<' entity audio-1-2 syster. "audio2 .uiaw" ndata ulaw> 

< "'entity imaae-i-i systerr. "imagel . cif " ndata gif> 

<> entity imaae-1-2 svsterr. "doc.ps" ndata postscript 

<i entity audio-2-I systeir. "audio3 . uiaw" ndata uiaw> 

oentitv imaae-2-1 system "image2.gif: ndata gif> 

<"• entity imaae-2-2 syster, "image3.gif" ndata postscript 

O.enttly image-3-1 syster. "end.gif" ndata gif> 



)> ^ — &S~>- 
<siideshow> 
<slirie naiae=first nexr=second> 

<item> 

<audio data-audio-l-l> 

</itetn> 

<irem> 

<image data=image-l-l> 

</item> 

<item unics»seconds szart-2> 
< image dara-image-I-2> 

</i"cem> 

<izen s.art-"130 w > 

<audio data-audic-I-2> 

</irein> 
</siide> 

<slide name-second next:-end> 
& <irem> 

' < audio data=audic-2-l> 

</irem> 
<item> 

<image data~image~2-l> 

</item> 

<item srart-45> 

<image cata-image-2-2> 

</item> cr^n 
</slide> ^ 
.<slide naicie»end> 
<item> 

f < image dara«itnage-3-l > 

$$v </iccm> 
</siide> 
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COMPACT TREE FOR STORAGE AND RETRIEVAL 
OF STRUCTURED HYPERMEDIA DOCUMENTS 

BACKGROUND OF THE INVENTION 

1 . Field of the Invention 

5 This invention relates to electronic data storage, and more particularly to storage, 
retrieval, transmission, and presentation of structured hypermedia documents in 
distributed computing systems. 

2. Description of Related Art 

Structured documents are a class of electronic information storage in which the text 
10 content of a document includes embedded character sequences known as "markup" which 
identify structural elements and attributes or formatting codes for the content. The 
Standard Generalized Markup Language (SGML) is an example of a syntax for storing 
and processing structured documents (International Organization for Standardization, ISO 
International Standard 8879 - Standard Generalized Markup Language, 1985 Geneva, 
15 Switzerland). The HyTime standard is an example of a markup language for structured 
hypermedia documents (International Organization for Standardization, ISO International 
Standard 10744 - Hypermedia Time-Based Structuring Language (HyTime), 1992. 
Geneva, Switzerland). The Hypertext Markup Language (HTML), which is defined as an 
SGML document type definition (DTD), is a widely used markup language for hypertext 
20 documents (Berners-Lee, T., and Connolly, D., Hypertext Markup Language - 2.0, 
Internet Engineering Task Force RFC 1 866, November 1995). 

Under the SGML standard, a DTD defines the structural components that are required and 
allowed in a particular type of document. Each DTD begins with a declaration of the 
document type, i.e., a statement that assigns an identifier to the particular document type 
25 ( e -g > a DTD defining a magazine may begin with the document type declaration 
"magazine"). All documents of the declared type should be structured according to the 
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DTD. The DTD then defines the elements, attributes, entities, and notations that may be 
used to compose a document of the given type. Elements are the components that create 
the logical structure of the document (e.g., a magazine's elements are articles, which may 
consist of text, pictures, and graphical figures or tables). Attributes are the characteristics 

5 that each element type may take on in a document of the given type (e.g., one attribute 
of an article is the page number on which it begins). Entities may be used to refer to long 
strings of text or to external files (e.g., the term "Johnson article" may refer to an article 
in another magazine). Notations identify non-SGML components and provide instructions 
for using these components when presenting the document. Additional information on 

io documents structured according to the SGML syntax may be found in The SGML Primer, 
SoftQuad, Inc., 1995. 

One conventional approach by which structured documents in electronic form are 
delivered involves a client-server division. A client application executing on a local client 
processor retrieves a document in its native encoding from a server software program 
is executing on a remote server processor (transfer). The client application then parses the 
document locally according to the grammar of the document structure (parsing), 
combines the output of the parser with local requirements regarding style attributes of 
individual elements (rendering), and presents the document and its content to the user 
interface (display). A retrieved document is transferred in bulk to the client. 

20 Bandwidth is a scarce resource in present client-server systems, so it would be desirable 
to minimize the amount of data required to be transferred from a server to a client. 
Accordingly, a second approach by which structured documents are delivered is for the 
document server to transmit to the client a compressed form of the document's native 
encoding (compression), using conventional compression techniques such as those 

25 described in Witten, I. H., Moffat, A., and Bell, T., Managing Gigabytes - Compressing 
and Indexing Documents and Images, NY: Van Nostrand Reinhold 1994. The client 
retrieves the compressed document (transfer), decompresses the document to obtain the 
native encoding (decompression), and then parses the document locally according to the 
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grammar of the document structure (parsing), combines the output of the parser with local 
requirements regarding style (rendering), and presents the document and its content to the 
user interface (display). However, a retrieved compressed document still is transferred 
in bulk to the client. 

5 In these conventional approaches, a document authoring tool generates documents in a 
native encoding. Thereafter, the documents are transferred to the server and retained in 
the native encoding. If a document has been structured for time-dependent presentation 
(that is, some elements of the document are to be displayed before other elements), the 
internal scheduling information of the document is not used to schedule transmission of 

10 document elements from server to client. Instead, the document is transferred in bulk to 
the client for parsing, rendering, and time-dependent display at the client. Further, in the 
conventional approaches, multiple users accessing the same document for simultaneous 
collaboration each must retrieve a new copy of the entire document each time the original 
document is modified by another user. 

15 Accordingly, it would be useful if the efficiency of the present client-server system of 
structured document access could be improved. The present invention provides a system 
and method that provides such improved efficiency. 
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SUMMARY OF THE INVENTION 

The present invention provides a system and method of storage, retrieval, transmission, 
and presentation of structured hypermedia documents in client-server distributed 
computing systems. In the preferred embodiment, a network server maintains a persistent 
storage of an arbitrary number of preprocessed structured documents. Each structured 
document is processed and parsed once at the network server, and the result is stored in 
compact tree (CT) form in a persistent object store. The CT form of the document then 
is delivered to requesting client computers as one or more objects. When a document is 
edited at a client computer, the CT form or the edit operations can be transferred from the 
client to the server. 

i 

Since the document is retrieved in pre-parsed and pre-processed form, the client computer 
need not have a document parsing function. The parsing operation is done once at the 
server, and thereafter the resulting representation may be accessed many times by 
different clients. The clients do not need to parse the compact representation, because the 
representation retrieved by the client contains a parse tree and symbol definitions needed 
to render the document for presentation. 

The compact tree techniques described above result in significantly smaller data transfers 
than traditional document transfer techniques. The actual bandwidth reductions depend 
upon the size and structure of the structured documents being transferred. Furthermore, 
transferring compact tree representations incrementally provides additional bandwidth 
and network performance improvements. 

Since the CT representation and the original source representation of the document are 
essentially equivalent, the CT representation may replace the source representation, 
thereby saving storage space and cache memory space at the server. Similarly, since 
client computers generally cache documents in source format, cache memory usage in the 
client computers is reduced. The CT representation also preserves cache storage space at 
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proxy servers that act as intermediate servers between the client computers and a remote 
server. 

Since the CT representation of a structured document reduces the time needed to access 
the document, the network server is able to sustain more sessions in a given interval. 
Because client computers do not need to parse the document, the clients are able to 
present the document more quickly than when using traditional document transfer 
techniques. 

Full-text and content-based retrieval are an important requirement for hypermedia 
document systems. The CT representation of structured documents separates the 
document's structural elements from its content, allowing the content to be stored and 
indexed by conventional full-text or content-based retrieval systems, while the structural 
portions can be queried by a structure-query processing language such as the HyQ query 
language [ISO 1992]. 

The CT representation also allows non-linear, partial-retrieval (incremental), and 
progressive access to documents. Since a document is stored on the server as a compact 
parse tree, client computers may access a subsection of the tree without retrieving and 
processing the entire tree. This speeds client access, particularly for large documents, 
since the compact tree can be partitioned into sub-trees deliverable in an arbitrary order, 
including nori-linear and temporal ordering. Even though incremental access is useful in 
supporting non-linear and temporally-ordered access to portions of a document, 
conventional document transfer processes do not permit incremental, per document file 
access. Processing time also is reduced because the syntactical validation performed by 
the parser is performed only once at the server, rather than each time the document is 
accessed by a client. 

In collaborative situations, multiple client computers may simultaneously view, and may 
even modify, the same document. To maintain document consistency among all clients 
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during collaboration, changes made to the document at each client are propagated to the 
server and to the other clients. The object-oriented storage model of the document permits 
edit transaction objects to be incorporated into the incremental delivery of the document. 

Advantages of the invention may include one or more of the following: 

A computer network may be simplified by eliminating the need for document 
parsers in client computers requesting access to structured documents. 
A structured document may be parsed once by a network server and then accessed 
indefinitely by multiple client computers. 

Network storage and bandwidth requirements may be reduced by transferring a 
compact representation of a structured document instead of transferring the entire 
document itself. 

Network performance may be improved by reducing the time required to access 
a structured document. 

Conventional information retrieval systems may be used to store and index the 
content of a structured document. 

Modifications to a structured document may be sent to a client user who is 
viewing the document without retransmitting the entire document 

The details of the preferred embodiment of the present invention are set forth in the 
accompanying drawings and the description below. Once the details of the invention are 
known, numerous additional advantages, innovations, and changes will become obvious 
to one skilled in the art. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a fiinctional block diagram of a computer network that stores and transmits 
structured documents in a compact tree representation. 

FIGURES 2A and 2B are flow diagrams of a network server's processing of a structured 
5 document. 

FIGURE 3 is a flow diagram of a network client's processing of a compact tree 
representation of a structured document. 

FIGURE 4 is a structural diagram of the elements of a document type definition (DTD) 
object for a compact tree representation of structured documents. 

10 FIGURE 5 is a schematic diagram of the elements of an instance object for a compact tree 
representation of a structured document. 

FIGURES 6 A and 6B are block diagrams illustrating the storage and transmission format 
of a DTD object. 

FIGURE 7 is a block diagram illustrating the storage and transmission format of the 
15 instance object. 

FIGURES 8 A and 8B are block diagrams of a computer network that stores and transmits 
structured documents in a partitioned compact tree representation. 

FIGURES 9A and 9B are block diagrams of a computer network in which modifications 
to a structured document are transmitted between network computers. 
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FIGURES 10A and 10B are a document type definition (DTD) and an example SGML 
document generated according to the DTD, respectively. 

FIGURES 1 1A and 1 IB are tables found in compact tree representations of the DTD and 
SGML document of FIGURES lOAand 10B. 

FIGURE 12 is another example SGML document generated according to the DTD of 
FIGURE 10A. 

FIGURES 13 A and 13B are tables found in partitioned compact tree representations of 
the SGML document of FIGURE 12 and the corresponding DTD. 

Like reference numbers and designations in the various drawings indicate like elements. 
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DETAILED DESCRIPTION OF THE INVENTION 

Throughout this description, the preferred embodiment and examples shown should be 
considered as exemplars, rather than as limitations on the present invention. 

Referring to FIGURE 1 and FIGURES 2 A and 2B, in a first embodiment of the present 
invention, a structured document 25 encoded according to a standard syntax, such as 
SGML [ISO 1988], may be shared among the various computers of a computer network 
20. In general, a network server computer 26 will receive a request for the document 25 
from at least one of a set of client computers 22 and 24 (Step 50). Upon receiving the 
request, the server 26 retrieves the document 25, either from an associated storage device 
27 if the server 26 is a computer system that permanently maintains the document 25 
(e.g., the server 26 is an Internet server), or from another computer system if the server 
does not permanendy maintain the document {e.g., the server 26 is a proxy server) (Step 
52). The document 25 is accompanied by a DTD file 28 that defines the syntactical 
elements used to convert the document 25 into a compact tree (CT) representation. The 
DTD file 28 allows the document to be reconstructed from the CT representation and to 
be presented as formatted output. 

After receiving the document 25, the server 26 uses an internal parser 30 to parse through 
the SGML document 25, breaking the document 25 into a parse tree 41 representing the 
document's structural hierarchy and a content list 43 representing the data contained in 
the document 25 (Step 54). The server 26 then converts the parse tree 41 and the content 
list 43 into structure and character data (CDATA) tables, respectively (Step 56). The 
structure tables store the document's structural information in compressed form, and the 
CDATA table stores the data content of the document 25 in compressed and indexable 
form. These tables are described in detail below. The server 26 then stores the structure 
and CDATA tables as a compact tree (CT) instance object 32 in a persistent data store 3 1 , 
such as an object-oriented database, an object-oriented file system, or an object-relational 
database (Step 58). The instance object 32 is structured such that related structural 
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elements (/.e., the document's elements and their corresponding sub-elements) are located 
together in the compact tree, which allows the clients 22 and 24 and the server 26 to 
access the instance object, and therefore the structured document, incrementally, as 
described below. If the server 26 includes an information retrieval system 35 (Step 60), 
the server 26 passes the CDATA portion of the instance object 32 to the information 
retrieval system 35 for storage (Step 62). The server 26 also compresses the DTD file 28 
into a compact DTD object 34 and stores it in the persistent object store 3 1 (Step 66), if 
the server 26 has not already done so while previously processing a document of the same 
type (Step 64). The server 26 uses standard compression techniques to create the instance 
object 32 and the DTD object 34. 

Instead of delivering the entire SGML document 25 to the requesting client 22 or 24, the 
server 26 delivers only a copy 33 of the CT instance object 32 and, if necessary, a copy 
39 of the CT DTD object 34. If the client 22 or 24 requests only a portion of the 
document (Step 68), the server 26 sends only the sub-objects of the CT instance object 
32 relating to the requested portion (Step 70). Otherwise, the server 26 transmits the 
entire CT instance object to the requesting client 22 or 24 (Step 72). Furthermore, the 
server 26 transmits the DTD object to the client 22 or 24 (Step 76) only when the client 
specifically requests the DTD object 34 (Step 74). Since the DTD may be any SGML- 
conforming definition, the client generally will need to retrieve a copy 39 of the DTD 
object 34 with the document instance object 32. However, if the client 22 or 24 can 
process the document instance 32 without the DTD object 34, or if the client 22 or 24 
already has a copy 39 of the DTD object 34, then only a copy 33 of the document instance 
object 32 need be transmitted to the client 22 or 24. The server 26 then waits for the next 
incremental or full-document request from the client 22 and 24 (Step 78). 

If a document is structured according to a syntax that the server 26 does not recognize, 
the server 26 may follow one of several approaches. First, the server 26 may reject the 
document completely and notify the requesting client that the document could not be 
delivered. Second, if only a portion of the document is unrecognizable, the server 26 may 
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ignore the unrecognizable portion and transmit the rest. Third, the parser 30 may replace 
the unrecognizable syntax with an acceptable syntax. Fourth, the server 26 may transmit 
the unrecognizable syntax to the client as character data (CD ATA). 



Referring also to FIGURE 3, after requesting the document or a portion thereof from the 
server 26, the client 22 or 24 receives a copy 33 of the CT instance object 32 or of the 
appropriate sub-objects (Step 82) and stores it in a local storage area 37, such as cache 
memory or a hard disk drive (Step 84). If the client 22 or 24 has not already received the 
DTD object 34 in response to an earlier request (Step 86), that client requests the DTD 
object 34 (Step 88) and, after receiving it, stores the object in the local storage area 37 
(Step 90). Using the DTD object as a guide, a CT processor 45 or 47 in the client 22 or 
24 then traverses the CT instance object (Step 92) and converts that object into a 
formatted document (Step 94). The client 22 or 24 then presents the formatted document 
to an output device 36 or 38, such as a video display or a printer (Step 96). 



In a second embodiment of the present invention, documents are pre-processed in the 
server 26 so that they are available in processed form before requested by a client. In a 
third embodiment of the present invention, as documents are requested by clients and 
processed in the server 26 as above, the processed form is stored in the server so that they 
are available in processed form for future requests by a client. 

Referring to FIGURE 4, a CT DTD object 34 preferably is organized into several tables, 
each of which contains a portion of the DTD information needed for recreation 
(rendering) and presentation of the structured document. In the preferred embodiment, 
the tables in the DTD object 34 include a generic identifier (GI) table 102, a name table 
104, an attribute declaration table 106, an attribute enumeration table 108, a notation table 
110, an entity table 112, and a character data (CDATA) table 114. Preferably, DTD 
information that is needed only at parse time {e.g., the content model of each generic 
identifier) is discarded and is not placed into the DTD object 34. The name table 104 and 
the CDATA table 1 14 preferably consist of text strings; the other tables preferably consist 
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of constant length numerical fields. In the preferred embodiment, the name table 104, 
notation table 1 10, entity table 1 12, and CDATA table 1 14 are "common" tables that may 
be shared by the DTD object 34 and the instance object 32. The GI table 102, attribute 
declaration table 106, and attribute enumeration table 108 are DTD-specific tables. 

To simplify access to the information stored in the DTD object 34, each table preferably 
has constant-size entries which may be identified by simple numeric indices. Client 
computers can find information in a table by multiplying the entry index for the desired 
information by the predetermined size of the entries. Variable length information, such 
as a list of sub-elements (or "children" elements) associated with an element, preferably 
is stored as consecutive entries in either the name table 104 or the CDATA table 1 14. As 
a result, all variable length information may be represented by two numeric values: one 
value representing the index of the first table entry containing a portion of the desired 
information, and the other value representing the number of entries required to store the 
information. Storing variable length information in constant size table entries trades space 
efficiency for access efficiency. One of ordinary skill in the art would recognize that the 
tables may be designed differently to strike a different balance between space efficiency 
and access efficiency. 

In the preferred embodiment, the GI table 102 maintains a list of generic identifiers 
associated with the attributes contained in the attribute declaration table 106. The GI table 
102 preferably contains at least four fields for each generic identifier. The first field 120 
indicates the index of the generic identifier string stored in the name table 104. The 
second field 122 indicates the index of the first attribute stored in the attribute declaration 
table 106 for each generic identifier. As described below, all attributes associated with 
a generic identifier are stored consecutively in the attribute declaration table 106, with 
fixed attributes followed by non-fixed attributes. The third field 124 and fourth field 126 
of the GI table 1 02 indicate the number of fixed attributes and the number of non-fixed 
attributes, respectively, associated with each generic identifier. If desired, other 
information may be stored in the GI table 102. 
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The name table 104 preferably stores a single copy of the text string associated with each 
generic identifier. Each string contained in the table 104 is represented by an index that 
identifies the position of the string in the table 104. The indices are represented by a 
binary value containing the least number of bits required to identify every generic 
identifier (£&, log 2 N, where N is the number of strings in the name table 104). By storing 
all identifier strings in the name table, the DTD object 34 reduces the storage space 
required to represent all string identifiers. 

The attribute declaration table 106 maintains a list of the attributes associated with each 
generic identifier, their types, and their default values. The table 106 preferably includes 
at least three fields: a name field 128, a type field 130, and a value field 132. The name 
field 128 indicates the index of the attribute name stored in the name table 104. The type 
field 130 indicates whether the attribute is an identifier (ID), an IDREF, a name, an entity 
reference, an enumeration, or CDATA. If the attribute is an identifier (ID or IDREF), the 
value field 132 stores the index of the identifier string in the name table 104. If the 
attribute is an enumeration, the value field 132 indicates the index of a list of possible 
enumeration values stored in the attribute enumeration table 108. If the attribute is 
CDATA, the value field 132 indicates the index of the default value in the CDATA table 
114. Because this information is stored in the attribute declaration table 106, the 
structured document and the CT instance object may contain a list of attributes that differ 
from their default values. The attribute declaration table 106 also eliminates the need to 
store information about fixed attributes in the document instance object. 

The attribute enumeration table 108 maintains a list of possible enumeration values for 
each attribute declared in the attribute declaration table 106. The attribute enumeration 
table preferably contains at least two fields. An NameTable Entry field 133 stores the 
index of a first enumeration value in the name table 104. A No. of Entries field 134 stores 
the number of possible enumeration values. These enumeration values are stored 
consecutively in the name table 104, starting at the index stored in the NameTable Entry 
field 133. Optionally, an index field 135 is also provided to reference multiple entries in 



.9734240A1JA: 



WO 97/34240 r*, i/uw 

-14- 



10 



15 



20 



25 



the attribute enumeration table 108. (Indeed, throughout these examples, all of the index 
columns are shown for clarity but are not explicitly part of the CT representation). 

The notation table 110 preferably includes at least three fields: a name field 140, a type 
field 142, and a value field 144. The name field 140 stores an index to the notation 
identifier string stored in the name table 104 for each notation. The type field 142 
contains a flag (preferably a single bit) for each notation that indicates whether the 
notation is a system or public notation. The value field stores an index to the correspond- 
ing notation value in the CDATA table 1 14. 

The entity table 112 preferably consists of at least four fields: a name field 146, a type 
field 148, a notation field 150, and a data field 152. The name field 146 stores an index 
to the entity identifier string stored in the name table 104 for each entity. The type field 
148 indicates whether the entity is notation data (NDATA), character data (CDATA), or 
specific character data (SDATA), each of which is known in the art. If the entity type is 
NDATA, the notation field 150 stores an index to the corresponding notation string in the 
notation table 1 1 0; otherwise, the notation field 1 50 is empty (or ignored). The data field 
1 52 stores an index to a corresponding entity definition in the CDATA table 1 14. 

The CDATA table 114 preferably stores unparsed character data, such as notation values, 
external entity values, and attribute values. Like the name table, the CDATA table 1 14 
stores each variable length string in consecutive constant-length entries that may be 
indexed easily. After the DTD object 34 is folly constructed, the CDATA table 1 14 is 
compressed using a conventional text compression technique. 

Referring to FIGURE 5, a CT instance object 32 also is composed of several tables, each 
of which contains a portion of the information required to reconstruct the document 
instance. In the preferred embodiment, the tables specific to the instance object 32 include 
at least an instance table 160, an attribute value table 162, and a child table 164, each of 
which is described below. These tables describe the document-specific element hierarchy 
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and identify any attributes that are not set to the corresponding default values. The 
instance object 32 also may use the four common tables in the DTD object {i.e., CD ATA, 
name, notation, and entity). The instance object 32 preferably will include an entity table 
{i.e., will use the common entity table) only when new external entities are used in the 
document instance. Likewise, the instance object 32 will include a notation table, GI 
table, or CDATA table only when new notations, identifiers, or CD AT A, respectively, 
are used in the document instance. The index values for the instance object elements in 
the common tables are determined by treating each table as a continuation of the 
corresponding table in the DTD object. 

The instance table 160 indicates the elements occurring in the document instance. The 
instance table 160 preferably includes at least five information fields for each element: 
a generic identifier (GI) field 166, a flag field 168, a first attribute field 170, a first child 
field 172, and a number-of-children field 174. The GI field 166 stores an index to a 
corresponding generic identifier in the GI table 102 of the DTD object. The flag field 168 
contains a binary bit vector in which each bit represents a non-fixed attribute in the 
corresponding element. If the bit corresponding to a particular attribute is set, that 
attribute does not have the default value, but rather its value is determined by the value 
found in the first attribute field 170. The first attribute field 170 stores the index of the 
first non-default attribute value stored in the attribute value table 162 for the 
corresponding element. The attributes for which the corresponding bits in the flag field 
168 are set are stored consecutively in the attribute value table 162, beginning at the 
index stored in the first attribute field 1 70. All attributes which are set to the correspond- 
ing default values are represented by cleared bits in the flag field 168 and are omitted in 
the attribute value table 162. The bits in the flag field bit vector and the attribute values 
in the attribute value table 162 appear in the same order that the non-fixed attributes for 
the corresponding generic identifier appear in the attribute declaration table 106. The first 
child field 172 stores the index of the corresponding element's first child element (or sub- 
element) in the child table 164. The number-of-children field 174 indicates how many 
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children sub-elements the corresponding element has. All sub-elements of an element are 
stored consecutively and in proper hierarchical order in the child table 164. 

The attribute value table 162 stores at least a value for every attribute that is not set to the 
corresponding defeult value. Each entry in this table consists of one value, an index to the 
identifier string in the name table 104 representing the default attribute value. The 
attribute name and attribute type need not be stored in this table because this information 
is stored in the attribute declaration table in the DTD object. 

The child table 164 preserves the hierarchical structure of the elements occurring in the 
document instance. Each entry in the child table 164 is represented by at least two 
information fields: a child field 176 containing a single bit indicating whether the entry 
is itself an element or is a pseudo element, and a value field 178 storing an index to the 
instance table 160 if the entry is an element or an index to the CDATA table 1 14 if the 
entry is a pseudo element. A child element contains everything located between two start 
and end tags found within the parent element, while the corresponding pseudo elements 
contain the content data located before the start tag and after the end tag, respectively. 

Each field in the tables of the instance object 32 preferably contains the smallest number 
of bits possible to present the required information. For example, the GI field 166 of the 
instance table 1 60 should contain only enough binary bits to represent the highest index 
in the GI table 102 of the DTD object. Similarly, the attribute value table 162 and the 
child table 164 are large enough only to index the respective tables in the DTD object. In 
the preferred embodiment, the bit vectors in the flag field 168 of the instance table 160 
must include one bit for each non-fixed attribute in the element type having the most non- 
fixed attributes. The number-of-cbildren field 174 must be large enough only to indicate 
the number of sub-elements in the element having the most sub-elements. Each child 
table entry must have enough bits to store the highest index value for the CDATA table 
1 14 and the instance table 160, plus an additional bit to indicate the child type. Each entry 
in the attribute value table 162 must be large enough to store the highest index in the 
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name table 104, CD ATA table 1 14, entity table 1 12, or attribute enumeration table 108, 
whichever is larger. The entries in the attribute value table 162 may have more bits than 
the value field 132 of the attribute declaration table 106 because the document instance 
may have added additional information to the name table 104 and the CD ATA table 1 14. 

5 The last entry 180 in the instance table 160 is unique in that it provides special 
information required by client computers to recreate the document instance. This entry 
preferably has at least one sub-element. Additional elements are pseudo elements 
containing any SGML processing instructions defined in the DTD. SGML processing 
instructions, which are known in the art, provide system specific information that must 

10 be used to reconstruct the document instance and therefore must be included in the 
instance object. The SGML processing instructions are followed by the first child that is 
a true element (as opposed to a pseudo element), which indicates the top level element 
of the document Any remaining children of the last instance table entry 180 may be used 
by the server to pass optional parameters, such as PI entity definitions, that help the server 

15 and the client optimize or customize delivery of the document instance. 

The entries in the document instance table 160 are ordered according to a recursive, 
depth-first search of the parse tree. As a result, entries representing components of the 
same element occupy consecutive locations in the name table 104, entity table 112, 
CD ATA table 1 14, instance table 160, attribute value table 162, and child table 164. Also, 
20 the entries representing a child element preferably appear before the entries of subsequent 
children of the same parent element. Furthermore, the entries representing an element's 
children preferably immediately precede the element's own entries in the document 
instance table 160. 

Referring to FIGURES 6A and 6B, the server stores and transmits the DTD object in a 
25 format understood by the client computers. The preferred format of the DTD-specific 
portion 200 of the DTD object is shown in FIGURE 6 A. The server first stores a fixed 
length integer field 202 representing the number of enumerations contained in the 
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attribute enumeration table 108 (FIGURE 4). The server then stores a bit-packed version 
of the attribute enumeration table 108 in a variable length field 206. Following the 
attribute enumeration information is a fixed length integer field 208 indicating the number 
of entries in the attribute declaration table 106 (FIGURE 4) and a variable length field 
210 containing a bit-packed version of the attribute declaration table 106. The server then 
stores a fixed length field 212 indicating the number of entries in the generic identifier 
(GI) table 102 (FIGURE 4), followed by a variable length field 214 containing a bit- 
packed version of the GI table 102. 

The preferred format of the common portion 220 of the DTD object (Le., the portion that 
may contain document instance information as well as DTD information) is shown in 
FIGURE 6B. The first two fields are a fixed length integer field 222 indicating the 
number of bytes in the compressed CDATA table 1 14 (FIGURE 4) and a variable length 
field 224 containing the compressed CDATA table 1 14. The next two fields are a fixed 
length integer field 226 indicating the number of entries in the name table 104 (FIGURE 
4) and a variable length field 228 containing the strings of the name table 104. These 
fields are followed by a fixed length integer field 230 indicating the number of entries in 
the notation table 1 1 0 (FIGURE 4) and a variable length field 232 containing a bit-packed 
version of the notation table 1 10. The last two fields of the common portion 220 of the 
DTD object are a fixed length integer field 234 indicating the number of entries in the 
entity table 112 (FIGURE 4) and a variable length field 236 containing a bit-packed 
version of the entity table 112. 

Referring to FIGURE 7, the server also stores and transmits the instance object 32 in a 
format understood by the client computers. The server preferably first stores a fixed 
length integer field 242 indicating the number of entries in the attribute value table 162 
(FIGURE 5), followed by a variable length field 244 containing a bit-packed version of 
the attribute value table 162. The next two fields include a fixed length integer field 246 
indicating the number of entries in the child table 164 (FIGURE 5) and a fixed length 
integer field 248 indicating the number of entries in the instance table 160 (FIGURE 5). 
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These fixed length fields are followed by two variable length fields, the first field 250 
containing a bit-packed version of the child table 1 64, and the second field 252 containing 
a bit-packed version of the instance table 160. As discussed above, the last entry in the 
instance table 160 identifies the DTD object corresponding to the instance object 32 and 
provides instructions for reconstructing the document instance. The client computers 
automatically retrieve the last entry in the instance table 160 to begin the reconstruction 
process. 

Referring to FIGURES 8A and 8B, the server 26 may create a CT instance object 32 in 
a manner mat allows incremental delivery of the instance object 32 in either linear or non- 
linear order. Incremental delivery permits a client to retrieve and present portions of a 
document that are needed immediately for viewing regardless of where they occur in the 
document For example, the client may need to begin its presentation of a document with 
a hyperlink that occurs in the middle of the document Incremental delivery also permits 
a progressive-style display of a document such as displaying all of the document's main 
headings before displaying its subheadings and body. Incremental access also allows the 
client to retrieve additional portions of a document only as the user attempts to view 
them, so that the server must transfer only those portions of the document that are needed 
by the user. This provides more optimal use of system resources when, for example, the 
,user hyperlinks to a new document before entirely viewing the current document. 
Incremental delivery also allows the client to access portions of a temporally organized 
document, so that parts of a document can be retrieved in a specific time order. 
Incremental transfer is important for hypertext access in which the user is likely to browse 
through documents without viewing their entire contents. 

As shown in FIGURES 8A and 8B, the server 26 creates an instance object 32 ready for 
incremental delivery by effectively dividing the associated compact tree into sub-trees. 
The server 26 divides the compact tree by partitioning the tables in the CT instance object 
32. Because all of the information about an element and its children is stored as 
consecutive entries in the instance object tables, the instance table may be partitioned 
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easily. To do so, the parser 30 generates an index 260 of element boundaries that 
indicates the first and last entries in each instance object table containing information 
about each element and its corresponding sub-elements. The index 260 of element 
boundaries preferably consists of at least three sub-indices: a first sub-index 262 that 
indicates the index bounds of information stored in the attribute value table 1 62 (FIGURE 
5) for each element and its corresponding sub-elements; a second sub-index 264 that 
indicates the index bounds of child information stored in the child table 164 (FIGURE 
5) for each element; and a third sub-index 266 that indicates the index bounds of entries 
in the document instance table 160 (FIGURE 5) for each element and its sub-elements. 
Alternatively, the server 26 may divide the document into a uniform set of equally-sized 
partitions without regard to corresponding elements and sub-elements. For a temporally 
scheduled document, the server may partition the document instance object 32 into sub- 
trees that preserve the time order in which the portions of the document must be presented 
to a client 

When a client requests incremental delivery of a document, a document delivery engine 
49 in the server 26 uses the index 260 of element boundaries to determine which entries 
fiom the three instance object tables (attribute value, child, and instance) must be sent to 
the requesting client Instead of sending each of the three tables in its entirety along with 
a fixed length integer indicating the size of the table, the document delivery engine 49 
delivers the requested range of entries along with a pair of fixed length integers, one of 
which indicates the index of the first entry in the delivered table fragment, and the other 
of which indicates the total number of entries in the delivered table fragment When the 
first sub-tree of the instance object is sent to the client, a third integer is transmitted by 
the server 26 to indicate the total size of the corresponding table, which allows the client 
to reserve enough memory space to receive all remaining fragments of the three tables, 
if necessary. The first two integers (i.e., index of the first entry and the number of entries 
in the table fragment) are used by the client to place the corresponding table fragment in 
the proper position in the client's copies of the tables. 
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In general, the name table 104, notation table 1 10, and entity table 1 12 (i.e., the common 
tables which may be shared by the DTD and instance objects) cannot be partitioned 
because the entries in these tables may be shared among sub-trees. However, the 
invention does not exclude the possibility that the CDATA table can be partitioned. 
Hence, there is the possibility that during increment transfer mode, the client computer 
may not receive the first CDATA partition that contains the desired DTD name (i.e., the 
client may get some other partition first if, in some cases, the CDATA table is being 
partitioned). To deal with this case, a convention is used such that, during incremental 
transfer mode, the server will send the DTD name to the client right before the table 
boundaries are sent; this only needs to be done for the first partition. 

When a client requests incremental delivery of an instance object that shares these tables 
with the DTD object, the server 26 may select between two alternative approaches. In the 
preferred embodiment, a single-bit flag stored in a flag register 270 is associated with 
each table to indicate which of the two approaches the server should use. If the flag bit 
is set, the table is small enough that the server 26 may send the entire table with each sub- 
tree. If the flag bit is cleared, the table is too large to be sent with each sub-tree, so the 
server 26 must select only those table entries that are needed to decode the particular sub- 
tree. In the latter situation, the server 26 transmits the numeric index of each entry 
selected from the table. The server 26 sets or clears the flag bits based upon the values 
of the integers associated with each table indicating the table size. 

When the common tables are too large to send with each sub-tree, the server 26 may 
determine which entries to send to the client in one of two ways. First, the server 26, 
when it parses a document instance 25, may create and store a record indicating which 
entries in the common tables correspond to each partition in the DTD object 34. Second, 
when the server 26 creates a partition to deliver to the client, the server 26 may parse the 
portion of the original document 25 corresponding to the partition to determine which 
elements of the name table 104, notation table 110, and entity table are used in the 
partition. One of ordinary skill will recognize that the first alternative is preferred in a 
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network that is more sensitive to increased processing overhead in the server during 
document delivery, and that the second alternative is preferred in a network that is more 
sensitive to increased storage overhead in the server. 

Unlike the other three common tables, the CD ATA table 1 14 can be partitioned since 
each piece of content data belongs to a unique element of the CT instance object 32. To 
support incremental delivery of documents, the server 26 independently compresses each 
entry in the CD ATA table 1 14 and separately delivers each entry with the corresponding 
instance object sub-tree when incremental access is requested. This division of the 
information in the CDATA table should correspond to the partitioning of the document 
into sub-trees for depth first ordering. 

To further improve incremental retrieval, the top elements of the document tree may be 
placed in a separate partition to which other sub-trees are attached. In large documents, 
the lower-level sub-trees also may be divided in a similar fashion. Sub-tree division is 
best performed in a DTD-specific way in order to use knowledge of the document 
structure for optimization. 

In response to a client's request for incremental delivery of a document, the server 
transfers all of the entries for the highest-level sub-tree. The client then processes the high 
level structure of the document and requests only the sub-trees it requires. The client may 
use conventional addressing mechanisms such as SGML IDREF or HyTime location 
addressing forms to identify any element instance within a document. When the client 
returns a request for a specific element, the server uses the address of the requested 
element to create an instance object sub-tree containing information for the requested 
element and its corresponding sub-elements. 

Progressive views of the document can be provided to the client by passing consecutive 
partitions of the same root sub-tree incrementally, but with successively increasing depth. 
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Alternatively, a root sub-tree having N levels can be delivered as a partition to the client, 
and the client can present the sub-tree progressively to the N levels. 

Referring again to FIGURE 1, the server 26 may be a proxy server acting as an 
intermediate server between a conventional server and its client computers, including 
5 clients located behind a "firewall" and mobile clients. The proxy server may serve as a 
bridge to improve system security or system performance. If a remote server passes a 
native encoding document 25 to the proxy sever, the parser 30 in the proxy server will 
convert the document 25 into the compact tree (CT) format and then forward the CT 
document to the client. 

10 Referring to FIGURES 9A and 9B, client computers 302 and 304 may engage in 
simultaneous collaborative author-mode access to a document stored as a CT instance 
object 306. At the same time, each client 302 and 304 may hold similar copies 308 and 
310 of the CT instance object 306 for simultaneous viewing and modification. When one 
client 302 modifies its copy 308 of the instance object by deleting, inserting, appending, 

15 or replacing information, a document editor 3 12 in the client computer 302 creates an edit 
object 320 that stores the modifications. The edit object 320 preferably includes one fixed 
length integer field 322 containing a time-stamp for the modifications and another fixed 
length integer field 324 indicating the address of the instance object sub-tree affected by 
the modifications. The edit object 320 also preferably includes a table 326 having at least 

20 two fields, the first 328 of which indicates the index of each element in the instance 
object that was modified, and the second 330 of which identifies the modification that 
was made. 

The modifying client 302 sends the edit object 320 to the server 300, which in turn sends 
the edit object 320 to the other client 304. A document editor 3 14 in the server 300 then 
25 uses the edit object to modify the instance object 306 accordingly. Likewise, a document 
editor 3 16 in the other client 304 uses the edit object 320 to modify its copy 3 10 of the 
instance object accordingly. Edit objects may be used with incremental delivery. 
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Example 

FIGURES 10A and 10B show a document type definition (DTD) 352 and an SGML 
document 350 generated according to the DTD 352, respectively. The DTD 352 defines 
a slide show presentation with time dependent information. According to the DTD 352, 
each slide show presentation is made up of one or more slide elements 354. Each slide 
354 in turn consists of one or more items 356 (or child elements), each item being either 
an image element 358 or an audio element 360. Each slide 354 also has two attributes: 
a "name" attribute 362 that uniquely identifies the corresponding slide, and a "next" 
attribute 364 that indicates the name of the next slide in the presentation. 

Each item element 356 (image or audio) has two attributes providing temporal 
information about the presentation of the element. A "start" attribute 366 indicates an 
amount of time to delay presentation of the item after the corresponding slide presentation 
has begun. When the "start" attribute 366 is set to the default value 368 of zero, the image 
or audio element is presented precisely when the presentation of the slide begins. A 
"units" attribute 370 indicates whether the "start" attribute 366 is measured in seconds 
372, minutes 374, or hours 376. According to the DTD 352, "seconds" is the default 
value 378 of the "units" attribute 370. 

Image elements 358 and audio elements 360 each include a data attribute 380 and 381, 
respectively, that is an SGML external entity, such as an image file or an audio file. The 
DTD 352 defines a notation 382 for image elements ("gif represents the "gif format 
for image data) and a notation 384 for audio elements ("ulaw" represents the ".ulaw" 
format for audio data). Both notations 382 and 384 are defined as "SYSTEM" type 
notations. Image elements 358 also include an "x" attribute 386 and a "y" attribute 388, 
which define the x-y location of the image in the slide presentation space. Both the "x" 
and "y" attributes 386 and 388 have default values 390 and 391 of zero. 

In FIGURE 10B, the document instance 350 declares that the "slideshow" document is 
defined by the document type definition "slideshow.dtd" 464, shown in FIGURE 10A. 
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The document instance 350 also declares a notation and five entities not declared in the 
DTD 352. The declared notation 464 ("postscript") is assigned the Adobe PostScript 3.0 
format 466 ("PS-Adobe-3.0"). The first two entities, an "audio-1" entity 468 and an 
"audio-2" entity 470, are "NDATA" entities associated with two audio data files, and 
"audio- l.ulaw" file 472 and an "audio-2. ulaw" file 474, respectively. The other three 
entities, an "image-1" entity 476, an "image-2" entity 478, and an "image-3" entity 480, 
are "NDATA" entities associated with three image data files, an "image-1. gif ' file 482, 
"doc.ps" 486, and an "end.gif ' file 484, respectively. 

After the document type and entity declarations, the document instance 350 defines the 
highest-level element in the document hierarchy, the "slideshow" element 488, Which has 
two sub-elements, a first "slide" 490 and a second "slide" 492. The first "slide" element 
490 has two attributes, a "name" attribute 494 that indicates the name of the first "slide" 
element 490 ("first") and a "next" attribute 496 that indicates the name of the second slide 
element. Because no slide follows the second "slide" element 492, the second "slide" 
element 492 has only a "name" attribute 498 indicating the name of the second slide 492 
("end"). 

The "first" slide 490 consists of four items elements, two audio elements 500 and 506, 
and two image elements 502 and 504. The first audio and image elements 500 and 502, 
which include the "audio-1" and "image-1" entities, are displayed simultaneously with 
the beginning of the first slide 490 since no delay period is specified. The second image 
element 504, which includes of the "image-2" entity, is displayed two seconds after the 
first image element 502 is displayed (£«.. the "start" attribute 507 for the second image 
504 has a value of "2", and the "units" attribute 508 has a value of "seconds"). The 
second audio element 506, which includes of the "audio-2" entity, begins 130 seconds 
after the beginning of the "first" slide 490 (i.e., the "start" attribute 510 for the second 
audio element has a value of "130"). 
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The "end" slide 492, which is presented after the "first" slide 490, consists of a single 
image element 512. This image element 512 includes the "image-3" entity and is 
displayed simultaneously with the beginning of the "end" slide presentation. 

FIGURES 1 1 A and 1 IB show the tables that make up the compact trees (CT) for the 
5 DTD 352 and document instance 350. The CDATA table 392, the name table 394, and 
the notation table 396 are common tables containing data representing both the DTD 352 
and the document instance 350. The other tables shown in FIGURE 1 1 A contain data 
representing only the DTD 352, and the tables in FIGURE 1 1 B contain data representing 
only the document instance 350. 

io The name table 394 of FIGURE 1 1 A includes 1 8 entries corresponding to the DTD 352^ 
of FIGURE 1 OA and 8 entries corresponding to the document instance 350 of FIGURE 
10B. The first entry (index 0) in the name table 394 is a null string, which is associated 
with objects that do not have a name (e.g., the last entry in the document instance table). 
The next 17 entries are entered according to a depth-first recursive pass through the DTD 
15 352. Therefore, elements appearing at lower levels of the DTD hierarchy appear first in 
the name table 394, while elements appearing at higher levels of the DTD hierarchy 
appear later in the name table 394. For example, the "audio" element 360 and the "image" 
element 358 are the lowest level elements in the DTD 352, so the corresponding entries 
in the name table 394 (index 1 and index 3, respectively) appear before the entries 
20 representing "item" element 356 (index 6), "slide" element 354 (index 12), and 
"slideshow" element 353 (index 15). likewise, the attributes associated with each 
element immediately follow the element name in the name table 394, unless a similarly 
named attribute appears higher in the table. For example, the entry representing the "data" 
attribute 381 of the "audio" element 360 immediately follows the name table entry for the 
25 "audio" element 360, but is not duplicated below the entry for the "image" element 358, 
which also includes a "data" attribute 380. Likewise, entries corresponding to the "x" 
attribute 386 and "y" attribute 388 of "image" element 358 immediately follow the entry 
for the "image" element 358. The last two entries (indexes 16 and 17) in the DTD portion 



BNSDOCID: <WO_9734240A1_IA> 



* WO 97/34240 



-27- 



PCT/US97/04574 



of the name table 394 are associated with the "gif ' and "ulaw" notations 382 and 384 
declared in the DTD 352. 

To reconstruct the original SGML document 350 from the tables of the compact tree, the 
computer requesting the document first reads the last entry (index 13) in the document 
instance table 450 to identify the corresponding document type definition and the location 
of the top level element (the "slideshow" element 488) in the document instance table 
450. Because the last entry in the document instance table 450 does not refer to an 
element of the document, no generic identifier or attribute is associated with the entry. 
Therefore, the corresponding "GI" and "first attribute" column entries are not examined. 
The client computer obtains the name of the required DTD from the last entry in the 
CDATA table 392 (index 12). The last entry in the document instance table 450 has at 
least one corresponding entry in the child table 444, one of which identifies the location 
of the highest level element in the document instance table 450. In this case, the last entry 
in the document instance table 450 includes exactly one child element, which is stored 
as the last entry in the child table 444 (index 12). This child element indicates the location 
(index 12) of the highest level element ("slideshow") of the compact tree in the document 
instance table 450. 

The computer then reads the document instance table entries for the highest level element 
to gather information about that element and its children elements. The "GI" field 452 
contains a value of "1", indicating that the corresponding generic identifier information 
is located in the second position (index 1) of the GI declaration table 416. The "identifier" 
field 41 8 in the GI declaration table contains a value of " 1 5", indicating that the name of 
the highest level element ("slideshow") is located in the sixteenth entry (index 15) in the 
name table 394. The "first attribute" field 420 in the GI declaration table contains a value 
("8") greater than the highest index of the attribute declaration table 408, which indicates 
that the "slideshow" element has no corresponding attributes. The requesting computer 
then returns to the document instance table and finds that because the "slideshow" 
element has no corresponding attributes, the "flags" field 454 has no bits set and the "first 



BNSDOCID: < WO 9734240A 1 J A> 



WO 97/34240 



-28- 



attribute" field 456 contains an invalid index value ("10"). The "first child" field 458 and 
the "number of children" field 460 indicate that the "slideshow" element has two sub- 
elements at the eleventh and twelfth positions in the child table 444. The "value" field 
448 of the child table 444 indicates that information for the first of these sub-elements is 
located in the ninth position (index 8) of the document instance table 450 and that 
information for the second sub-element is located in the twelfth position (index 1 1) of the 
document instance table 450. 

Reading the information contained in the ninth entry of the document instance table 450, 
the computer learns that the generic identifier for the sub-element is contained in the fifth 
position (index 4) of the GI declaration table 416. This entry in the GI declaration table 
4 1 6 points the computer to the thirteenth position (index 1 2) of the name table 394, which 
indicates that the sub-element is a "slide" element. The attribute fields 420, 422, and 424 
of the GI declaration table 41 6 indicate that each slide element has two associated unfixed 
attributes that are identified by the seventh and eighth entries (indices 6 and 7) in the 
"attribute" declaration table 408. The first attribute of each "slide" element is an IDREF- 
type attribute, the name ("next") of which is contained in the fifteenth entry (index 14) 
in the name table 394. The "next" attribute has no default value, as indicated by the null 
string contained in the first entry (index 0) of the CDATA table 392. The second attribute 
of the "slide" element is an ED-type element, the name of which ("name") is contained 
in the fourteenth entry (index 13) in the name table 394. The "name" attribute also has 
a null default value. 

After identifying the "name" and "next" attributes associated with the "slide" element, 
the computer returns to the ninth entry of the document instance table 450 and reads the 
values of the bits in the corresponding entry of the "flags" field 454. Because each of the 
first two bits is set, the computer knows it must retrieve the actual values of the "name" 
and "next" attributes from the attribute value table 440. The "first attribute" field 456 
directs the computer to the seventh and eighth positions (indices 6 and 7) of the attribute 
value table 440, which in turn refers the computer to the twenty-fourth and twenty-fifth 
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positions (indices 23 and 24) of the name table 394. From the name table, the computer 
learns that the "name" of the first slide element is "first" (index 24) and that the "next" 
slide element is the "end" slide element (index 23). The "first child" and "number of 
children" fields 458 and 460 indicate that the "first" slide element has four sub-elements, 
which are listed sequentially beginning at the fifth position (index 4) of the child table 
444. The computer then accesses the value field 448 of the child table 444 to leam that 
information for the four sub-elements of the "first" slide element is contained in the 
second, fourth, sixth, and eighth entries (indices 1,3,5, and 7) of the document instance 
table. 

The computer next accesses information for the first sub-element of the "first" slide. The 
corresponding entry in the "GI" field 452 of the document instance table 450 indicates 
that the generic identifier is identified in the first entry ( index 0") in the GI declaration 
table 416. The "identifier" field 418 of the GI declaration table 416 in turn directs the 
computer to the seventh entry (index 6) of the name table 394, which identifies the first 
sub-element as an "item"element. The GI declaration table 416 then directs the computer 
to the fifth and sixth entries (indices 4 and 5) in the attribute declaration table 408 for 
information about the two unfixed attributes associated with the "item" element. The first 
attribute is a CDATA attribute, the name of which ("start") is located in the twelfth 
position (index 1 1) of the name table 394. The "start" attribute has a default value of "0" 
as indicated by the "default" field 414 of the attribute declaration 408 and the second 
entry (index 1) of the CDATA table 392. The second attribute of the "item" element is 
the "units" attribute, as indicated by the "name" field 410 of the attribute declaration table 
408 and the eighth entry (index 7) in the name table 394. The "units" attribute has three 
possible values, "seconds", "minutes", and "hours", which are referenced by the 
enumeration table 404 as the ninth through eleventh positions (indices 8-10) of the name 
table 394. The default value of the "units" attribute is "seconds". 

After identifying the attributes associated with the "item" element, the computer returns 
to the document instance table 450 and skips the "first attribute" field since none of the 
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bits in the "flags" field are set The computer then reads information from the "first child" 
field 458 and the "number of children" field 460 to learn that the first "item" element has 
one sub-element identified by the first entry (index 0) in the child table. The "value" field 
448 of the child table 444 directs the computer to the first entry (index 0) in the document 
instance table 450 for information about this sub-element The "GI" field 452 of the 
document instance table 450 indicates that the generic identifier for the sub-element is 
identified by the third entry (index 2) of the GI declaration table 416. The "identifier" 
field 418 of the GI declaration table 416 indicates that this sub-element is an "audio" 
element as specified in the second position (index 1 ) of the name table 394. The "unfixed 
attribute" field 424 of the GI declaration table 416 indicates that the "audio" element has 
a single attribute which, according to the "first" field 420 of the GI declaration table 416, 
is identified by the first entry (index 0) of the attribute declaration table 408. The "name" 
field 410 of the attribute declaration table 408 and the third entry (index 2) of the name 
table 394 indicate that this attribute is a "data" attribute. This "data" attribute is an entity 
having a Null default value, as indicated in the "default" field 414 of the attribute 
declaration table 408 and the first entry (index 0) of the CDATA table 392. Because the 
first bit of the corresponding entry in the "flags" field 454 of the document instance table 
450 is set, the computer must look to the attribute value table entry (index 0) identified 
in the "first attribute" field 456 of the document instance table 450 to learn the value of 
the "data" attribute. This entry in the attribute value table 440 directs the computer to the 
nineteenth entry (index 18) in the name table 394 (i.e., the first name table entry 
corresponding to the document instance), which indicates that the name of the "data" 
attribute is "audio-1". Also, because the attribute is an entity, the computer accesses the 
first entry (index 0) in the entity table to determine which entity is associated with the 
"audio-1" attribute. The "type" field 434 of the entity table 430 indicates that the entity 
is of the type "NDATA", and the "notation" field 436 indicates that the entity associated 
with the "audio-1" attribute is defined in the second entry (index 2) of the notation table 
396. The "name" field of the notation table 396 indicates that the entity has the "ulaw" 
notation, as listed in the eighteenth entry (index 17) of the name table 394. The notation 
table 396 also indicates that the "audio-1" attribute is a SYSTEM type object, as indicated 
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in the "value" field 402 of the notation table 396. The "value" field 438 of the entity table 
430 then directs the computer to the fifth entry (index 4) of the CDATA table 392 for the 
name of the entity associated with the "audio-1" attribute. The CDATA table 392 
specifies that the "audio-1. ulaw" data file is associated with the "audio-1" attribute of the 
first "audio" element. The computer returns to the first entry (index 0) of the document 
instance table 450 and learns that the first "audio" element has no sub-elements. 

The computer then moves to the fourth entry (index 3) of the document instance table 
450, which represents the second sub-element of the first "slide" element. Like the first 
sub-element of the "slide" element, the second sub-element is an "item" element having 
two unfixed attributes, a "start" attribute and a "units" attribute. Because none of the bits 
in the corresponding entry in the "flags" field 454 of the document instance table 450 is 
set, both the "start" and the "units" attributes are set to the default values, which means 
that the second item element begins simultaneously with the first item element. The "first 
child" field 458 and "number of children" field 460 indicate that the second "item" 
element also has a single sub-element, which is identified by the second entry (index 1) 
of the child table 444. Information for the sub-element is contained in the third entry 
(index 2) of the document instance table 450. 

The "GI" field 452 of the document instance table 450, the "identifier" field 418 of the 
GI declaration table 416, and the fourth entry (index 3) of the name table 394 indicate that 
the only sub-element of the second "item" element is an "image" element. The "unfixed 
attribute" field 424 of the GI declaration table 4 1 6 indicates that the "image" element has 
three unfixed attributes, which are listed consecutively beginning with the second entry 
(index 1) of the attribute declaration table 408. The first attribute of the "image" element 
is a "data" entity having a null default value. The other two attributes are the "x" and "y" 
values that indicate the "x-y" position of the "image" element in the presentation space. 
These attributes are identified in the "name" field 410 of the third and fourth entries 
(indices 2 and 3) in the attribute declaration table 408 and the fifth and sixth entries 
(indices 4 and 5) in the name table 394. The "default" field 414 of the attribute 
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declaration table 408 and the second entry (index 1) in the CDATA table indicate that 
both the "x" arid "y" attributes have a default value of "0". The "flags" field 454 indicates 
that the "x" and "y" attributes of the first "image" element have the default values, while 
the value of the "data" attribute of the first "image" element is specified by the second 
entry (index 1) of the attribute value table. The "value" field 442 of the attribute value 
table 440 and the twentieth entry (index 19) of the name table 394 identify "image- 1" as 
the name of this "data" attribute. The second entry (index 1) of the entity table and the 
first entry (index 0) in the notation table 396 identify the "image-1" attribute as a 
SYSTEM type object of the ".gif" file type. The "image-1" attribute is associated with 
the "image-1 .gif data file, as indicated by the sixth entry (index 5) of the CDATA table 
392. 

The computer then retrieves information about the other two sub-elements of the first 
"slide" element from the sixth and eighth entries (indices 5 and 7) in the document 
instance table 450. The computer learns that each of the sub-elements is a "item" element 
having a single sub-element The first of these "hem" elements includes an "image" sub- 
element that is associated with the "PS-Adobe-3.0" data file and that has a presentation 
delay of two seconds. The second of these "item" elements includes an "audio" sub- 
element that is associated with the "audio-2.ulaw" data file and that has a presentation 
delay of 130 seconds. 

After the computer displays the second "audio" sub-element of the first "slide" element, 
it moves to the second "slide" element in the "slideshow" presentation. As discussed 
above, the second "slide" element is represented by information contained in the twelfth 
entry (index 1 1) of the document instance table 450. Because the second "slide" element 
is the last element of the "slideshow" presentation, the first bit of the "flags" field 454 and 
the document instance table 450 is cleared and the second bit is set, indicating that the 
"next" attribute of the "slide" element has the null default value, while the "name" 
attribute has the value ("end") identified by the last entry (index 9) in the attribute value 
table 440. The second "slide" element has a single "item" sub-element, which in rum has 
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a single "image" sub-element associated with the "end.gif ' data file. The third "image" 
element is associated with the "endgif ' data file and is displayed at the default "x-y" 
location (0,0) and with the default presentation delay of zero seconds. Once the second 
"slide" element has been presented, the requesting computer has fully reconstructed and 
displayed the "slideshow" document. 

The first four entries of the CDATA table 392 represent information in the DTD, so every 
CDATA field in one of the other tables in FIGURE 1 1 A must contain two bits to 
reference an entry in the CDATA table 392. Likewise, the first 18 entries of the name 
table 394 represent information in the DTD, so every name field in the other tables of 
FIGURE 11A must contain five bits to reference an entry in the name table 394. 
Therefore, each of the first two entries in the notation table 396, both of which represent 
information in the DTD, must be eight bits in length: five bits representing the index of 
the notation name 398 in the name table 394; one bit indicating the notation type 400 (i.e. , 
"SYSTEM" or "PUBLIC"); and two bits representing the index of the notation value 402 
in the CDATA table 392. Because the notation table 396 has two entries representing 
information in the DTD, only one bit is needed in the other tables of FIGURE 1 1A to 
access information in the notation table 396. No entities are defined in the DTD of 
FIGURE 1 OA, so no entity table is shown in FIGURE 1 1 A. 

The DTD of FIGURE 10A defines only a single enumeration with three possible values. 
Therefore, the enumeration table 404 references only three entries, beginning at the index 
in the name table 394 indicated by a five-bit NameTable field 406. 

Each entry in the attribute declaration table 408 includes three fields: a five-bit name field 
410 representing the index of the attribute name in the name table 394; a three-bit type 
field 412 indicating which of the possible types (name, entity, CDATA, IDREF, ID, or 
enumeration) each attribute takes on; and a five-bit default value field 414 representing 
the index of the default value in the name table 394. Because the attribute declaration 
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table 408 has eight entries, every entry in one of the other tables of FIGURE 11 A 
referencing the attribute declaration table 408 must contain four bits. 

In the preferred embodiment, each entry in the GI declaration table 416 has 1 1 bits: five 
bits representing the generic identifier name 418 in the name table 394; four bits 
representing the index of the first attribute 420 in the attribute declaration table 408; and 
two bits representing the number of unfixed attributes 424 (no GI in the DTD has more 
than three possible unfixed attributes). Because the DTD of FIGURE 10A defines no 
fixed attributes defined for any GI, no bits are needed in the "number of fixed attributes" 
field 422. 

In addition to the four entries representing the DTD, the CDATA table 392 also includes 
nine entries representing the document instance. Therefore, any CDATA fields in the 
document instance tables of FIGURE 1 IB must contain four bits to reference an entry in 
the CDATA table 392. The name table 394 contains eight entries representing the 
document instance, yielding 26 total entries in the name table 394. As a result, five-bit 
fields are used in the document instance tables of FIGURE 1 IB to reference the name 
table 394. The notation table 396 contains one entry representing the document instance, 
in addition to the two DTD entries, so two-bit fields are used in the document instance 
tables to reference the notation table 396. In the preferred embodiment, the entry in the 
notation table 396 representing the document instance uses 10 bits: five bits representing 
the index of the notation name 398 in the name table 394, one bit indicating the notation 
type 400 ("PUBLIC"), and four bits representing the index of the notation value 402 in 
the CDATA table 392. 

Because the document instance of FIGURE 10B defines five entities, the compact tree 
for the document instance includes an entity table 430. Entries in the other document 
instance tables contain three bits to reference the five entries in the entity table 430. Each 
entry in the entity table 430 itself contains 13 bits: five representing the index of the 
entity name 432 in the name table 394; two indicating the entity type 434 (NDATA, 
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SDATA, or CDATA); two representing the index of the entity notation 436 in the 
notation table 396; and four representing the index of the entity value 438 in the CDATA 
table 392. 

The compact tree for the document instance also includes an attribute value table 440, 
each entry of which includes a five bit attribute value field 442 representing the index of 
the attribute name in the name table 394. A child table 444 includes a one-bit field 446 
indicating the type of each child entry ("ELEMENT' or "PSEUDO-ELEMENT") and a 
four-bit field 448 representing the index of the child entry in either the CDATA table 392 
or the document instance table 450, discussed below. 

The document instance table 450 has 15 bits per entry. Five bits represent the index of the 
entry's GI 452 in the GI declaration table 416. A three-bit flag vector 454 indicates 
; which, if any, of the three possible unfixed attributes are not set to the corresponding 
.default value. For "slide" elements (GI table index of "4" and name table index of "12"), 
.the first bit in the flag field 454 indicates whether the "name" attribute is set to the default 
value, and the second bit indicates whether the "next" attribute is set to the default value. 
The third bit is not used. For "item" elements (GI table index of "0" and name table index 
of "6"), the first bit indicates whether the "units" attribute is set to the default value, and 
the second bit indicates whether the "start" attribute is set to the default value. The third 
bit is not used. For "audio" elements (GI table index of "2" and name table index of "1"), 
the first bit indicates whether the "data" attribute is set to the default value, and the other 
two bits are not used. For " image" elements (GI table index of "3" and name table index 
of "3"), the first bit indicates whether the "data" attribute is set to the default value, the 
second and third bits indicate whether the "x" and "y" attributes, respectively, are set to 
the corresponding default values. 

Each entry in the document instance table 450 also contains four bits representing the 
index for the attribute value table 440 entry holding the actual value of the first unfixed 
attribute 456 that is not set to the default value. Four bits represent the index of the 
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entry's first child 458 in the child table 444. Because no element in the document instance 
has more than four children, three bits are used to indicate the number of children 460 for 
each entry in the instance table 450. 

In this example, the actual DTD file uses approximately 455 bytes of storage space, while 
the compact tree representation of the DTD consumes only 128 bytes. Likewise/the 
actual document instance file requires approximately 731 bytes of storage space, while 
the compact tree representation of the document instance uses only 183 bytes. 

Referring to FIGURE 12, a sample SGML document instance 550 generated according 
to the DTD 352 of FIGURE 10A includes a top-level "slideshow" element 552 and three 
"slide" sub-elements: a "first" slide element 554, a "second" slide element 556, and an 
"end" slide element 558. The simple structure of the document instance 550 lends the 
corresponding parse tree naturally to partitioning at each slide element, though the parse 
tree also could be partitioned in other ways. 

Referring to FIGURES 13A and 13B, the compact tree representations of the DTD 352 
and the document instance 550 include a name table 560, a notation table 562, a CDATA 
table 564, an enumeration table 566, an attribute declaration table 568, a GI declaration 
table 570, a document instance table 572, an entity table 574, an attribute value table 576, 
and a child table 578, all similar to those described above. The CT representation for the 
document instance also includes an "element boundaries" table 580, which identifies the 
entries in the attribute value, child, and document instance tables that correspond to each 
of four partitions, or sub-trees, in the CT representation of the document instance. The 
first partition 582 ("top") identified in the element boundaries table 580 includes only the 
top-level elements of the document: the "slideshow" element 552 and each of the "slide" 
sub-elements 554, 556, and 558. The next partition 584 ("first") in the element boundaries 
table 580 includes only those sub-elements associated with the "first" slide element 554. 
Likewise, the next partition 586 ("second") includes only those sub-elements associated 
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with the "second" slide element 556, and the last partition 588 ("end") includes only 
those sub-elements associated with the "end" slide element 558. 

For each sub-tree, the element boundaries table 580 includes seven information fields: an 
"element" field 590 identifying the sub-tree; an "attribute value start" field 592 and an 
"attribute value end" field 594 identifying the indices of the first and last entries, 
respectively, in the attribute value table 576 corresponding to the sub-tree; a "child start" 
field 596 and a "child end" field 598 identifying the indices of the first and last entries, 
respectively, in the child table 578 corresponding to the sub-tree; and a "document 
instance start" field 600 and a document instance end" field 602 identifying the indices 
of the first and last entries, respectively, in the document instance table 572 correspond- 
ing to the sub-tree. 

When the requesting computer first requests partitioned delivery of the document, the 
CD ATA table 564, the name table 560, the notation table 562, and the entity table 574 
are sent in their entirety, but only the portions corresponding to the "top" sub-tree 582 are 
sent for the attribute value table 576 (indices 10 through 14), the child table 578 (indices 
8 through 19), and the document instance table 572 (indices 16 through 20). When the 
requesting computer later requests a child element of one of the "slide" elements, only 
those entries in the attribute value, child, and document instance tables corresponding to 
that "slide" element are sent. For example, if a child element of the "second" slide 
element 556 is requested, the requesting computer receives only the seventh through ninth 
entries (indices 6 through 8) from the attribute value table 576, the fifth through seventh 
entries (indices 4 through 6) from the child table 578, and the ninth through fourteenth 
entries (indices 8 through 13) from the document instance table 572. 



Implementation 

The method of the invention may be implemented in hardware or software, or a 
combination of both. However, preferably, the method of the invention is implemented 
in computer programs executing on programmable processors each comprising a 
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processor, a data storage system (including volatile and non-volatile memory and/or 
storage elements), at least one input device, and at least one output device. 

Each program is preferably implemented in a high level procedural or object oriented 
programming language to communicate with a processor. However, the programs can be 
implemented in assembly or machine language, if desired. In any case, the language may 
be a compiled or interpreted language. 

Each such computer program is preferably stored on a storage media or device (e.g., 
ROM, flash RAM, or magnetic diskette) readable by a general or special purpose 
programmable processor, for configuring and operating the processor when the storage 
media or device is read by the processor to perform the procedures described herein. The 
inventive system may also be considered to be implemented as a processor-readable 
storage medium, configured with a computer program, where the storage medium so 
configured causes a processor to operate in a specific and predefined manner to perform 
the functions described herein. 

A number of embodiments of the present invention have been described. Nevertheless, 
it will be understood that various modifications may be made without departing from the 
spirit and scope of the invention. For example, the CT instance object may have 
additional headers that provide information such as character set identification, 
sequencing of objects, time stamps, ownership, and other parameters needed for 
managing or optimizing a distributed hypermedia system. Also, the CT DTD object and 
the CT instance object may be stored and transmitted with different formats than those 
shown here. Furthermore, structured document syntaxes other than SGML may be 
supported, either by handling non-SGML documents directly or by first translating non- 
SGML documents into the SGML syntax. Additionally, while an implementation has 
been described that uses a "client request" model, other models may be used, such as a 
subscriber or broadcast model (client requests a subscription to information, and the 
server from time to time publishes that information to the client without further requests), 
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or a server push model (a client registers for specific information, and the server from 
time to time sends such specific information without further request), or combinations of 
such models. 

Accordingly, it is to be understood that the invention is not to be limited by the specific 
illustrated embodiments, but only by the scope of the following claims. 
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CLAIMS 

What is claimed is: 

1 1. A computer network comprising: 

2 (a) a client computer that authorizes receipt of a structured document, and is 

3 capable of receiving such structured document; and 

4 (b) aserver computerthat, upon receiving at least an initial receipt authoriza- 

5 tion from a client computer, retrieves the structured document, parses the 

document into a structural portion and a content portion, and thereafter 

7 sends at least some of the two portions to the client computer. 

, 2. The computer network of claim 1 wherein the server also sends the client 

2 computer a document type definition corresponding to the requested document 

A 3. The computer network of claim 1 wherein the server compresses the structured 

2 portion of the document into a compact tree. 

1 4. The computer network of claim 1 wherein the server compresses the content 

2 portion of the document. 

, 5. The computer network of claim 1 further comprising a processor in the client that 

2 traverses the structured portion and the content portion of the document to 

3 reconstruct the document. 

1 6. The computer network of claim 1 wherein the structured portion of the document 

2 is arranged to allow incremental access by the client computer. 
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7. The computer network of claim 1 wherein the structured portion of the document 
comprises a compact tree, at least one element of which comprises an associated 
sub-tree representing sub-elements hierarchically subordinate to the element 

8. The apparatus of claim 7 wherein the compact tree comprises a table defining the 
structural elements of the document. 

9. The computer network of claim 1 wherein the structured document adheres to a 
formalized syntax. 

10. The computer network of claim 9 wherein the syntax comprises SGML. 

11. A computer data structure for storing structured document data in a persistent 
object storage of a network server computer, comprising a compact tree including 
a parse tree representing the structural hierarchy of the structured document, and 
a content list representing the data contained in the structured document. 

12. A persistent object storage in a network server computer configured to store 
structured document data as a compact tree comprising a parse tree representing 
the structural hierarchy of the structured document, and a content list representing 
the data contained in the structured document. 
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13. A method of storing information representing a structured document in a 
persistent object storage of a network server computer, the method comprising the 
steps of: 

(a) parsing the structured document to form a parse tree representing the 
structural hierarchy of the structured document and a content list 
representing data contained in the structured document, and 

(b) storing the parse tree and the content list in the persistent object storage. 

14. The method of claim 13 further comprising the step of storing in the persistent 
object storage a document type definition corresponding to the structured 
document. 

15. The method of claim 1 3 further comprising the step of compressing the content 
list before storing it in the persistent object storage. 
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16. A method of sharing a structured document in a computer network, the method 



2 comprising the steps of : 

3 (a) transmitting a parse tree representing the structural hierarchy of the 

4 structured document and a content list representing data contained in the 

5 structured document from a network server computer to a network client 

6 computer, 

7 (b) reconstructing the structured document in the network client computer 
e using the transmitted parse tree and content list 

1 17. The method of claim 16 further comprising the step of transmitting a document 

2 type definition corresponding to the structured document from the network server 

3 computer to the network client computer. 

1 18. The method of claim 1 6 wherein the content list comprises compressed text data. 

1 1 9. The method of claim 1 8 further comprising the step of expanding the compressed 

2 text data after transmitting it to the network client computer. 

1 20. The method of claim 16 wherein the parse tree and the content list are transmitted 

2 incrementally to the network client computer. 

1 21 . The method of claim 16 wherein the parse tree is transmitted to the network client 

2 computer as partitioned sub-trees. 
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22. A method of displaying a structured document in a network client computer, the 
method comprising the steps of : 

(a) receiving from a network server computer a parse tree representing the 
structural hierarchy of the structured document and a content list 
representing data contained in the structured document, 

(b) reconstructing the structured document in the network client computer 
using the received parse tree and content list, and 

(c) displaying the reconstructed structured document on a display device 
attached to the network client computer. 
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1 23. A computer program, residing on a computer-readable medium, comprising 

2 instructions for causing a server processor to parse a structured document data 

3 into a compact tree comprising a parse tree representing the structural hierarchy 

4 of the structured document, and a content list representing the- data contained in 

5 the structured document. 

1 24. A computer program, residing on a computer-readable medium, for representing 

2 a structured document in a persistent object storage of a network server computer, 

3 comprising instructions for causing a processor to: 

4 (a) parse the structured document to form a parse tree representing the 

5 structural hierarchy of the structured document and a content list 

6 representing data contained in the structured document, and 

7 (b) store the parse tree and the content list in the persistent object storage. 

1 25. A computer program, residing on a computer-readable medium, for sharing a 

2 structured document in a computer network, comprising instructions for causing 

3 at least one processor to: 

4 (a) transmit a parse tree representing the structural hierarchy of the structured 

5 document and a content list representing data contained in the structured 

6 document from a network server computer to a network client computer, 

7 (b) reconstruct the structured document in the network client computer using 

8 the transmitted parse tree and content list. 
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1 26, A computer program, residing on a computer-readable medium, for displaying a 

2 structured document in a network client computer, comprising instructions for 

3 causing a processor to: 

4 (a) receive from a network server computer a parse tree representing the 

5 structural hierarchy of the structured document and a content list 

6 representing data contained in the structured document, 

7 (b) reconstruct the structured document in the network client computer using 

8 the received parse tree and content list, and 

9 (c) display the reconstructed structured document on a display device 

10 attached to the network client computer. 
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